Shotgun sequencing of the vaginal microbiome reveals both a species and functional potential signature of preterm birth


An association between the vaginal microbiota and preterm birth (PTB) has been reported in several research studies. Population shifts from high proportions of lactobacilli to mixed species communities, as seen with bacterial vaginosis, have been linked to a twofold increased risk of PTB. Despite the increasing number of studies using next-generation sequencing technologies, primarily involving 16S rRNA-based approaches, to investigate the vaginal microbiota during pregnancy, no distinct microbial signature has been associated with PTB. Shotgun metagenomic sequencing offers a powerful tool to reveal community structures and their gene functions at a far greater resolution than amplicon sequencing. In this study, we employ shotgun metagenomic sequencing to compare the vaginal microbiota of women at high risk of preterm birth (n = 35) vs. a low-risk control group (n = 14). Although microbial diversity and richness did not differ between groups, there were significant differences in terms of individual species. In particular, Lactobacillus crispatus was associated with samples from a full-term pregnancy, whereas one community state-type was associated with samples from preterm pregnancies. Furthermore, by predicting gene functions, the functional potential of the preterm microbiota was different from that of full-term equivalent. Taken together, we observed a discrete structural and functional difference in the microbial composition of the vagina in women who deliver preterm. Importance: with an estimated 15 million cases annually, spontaneous preterm birth (PTB) is the leading cause of death in infants under the age of five years. The ability to accurately identify pregnancies at risk of spontaneous PTB is therefore of utmost importance. However, no single cause is attributable. Microbial infection is a known risk factor, yet the role of vaginal microbes is poorly understood. Using high-resolution DNA-sequencing techniques, we investigate the microbial communities present in the vaginal tracts of women deemed high risk for PTB. We confirm that Lactobacillus crispatus is strongly linked to full-term pregnancies, whereas other microbial communities associate with PTB. Importantly, we show that the specific functions of the microbes present in PTB samples differs from FTB samples, highlighting the power of our sequencing approach. This information enables us to begin understanding the specific microbial traits that may be influencing PTB, beyond the presence or absence of microbial taxa.


Preterm birth (PTB) is the leading cause of mortality in infants under the age of 5 years1. The rate of spontaneous PTB, defined as the onset of labour and subsequent delivery before 37 weeks of gestation, is reported to be at one in ten pregnancies (~15 million PTB annually) worldwide2,3. In addition to increased mortality, prematurity is associated with significant morbidity in terms of chronic lung disease, increased rates of neurodevelopmental delay, and long-term health problems4,5,6. The economic burden through healthcare costs related to prematurity is estimated to be $26 billion in the United States alone7.

Several studies have investigated links between maternal health status including diet8, smoking9, obesity10, stress11, age12, and previous history of obstetric complications13 as indicators of PTB. Microbial infection, specifically urinary tract infections, vaginitis, bacterial vaginosis (BV), and even periodontal disease have also been associated with an increased risk of spontaneous preterm delivery14,15,16. The composition of the microbiota of the mother, and in particular the mother’s vaginal microbiome, has more recently been linked to spontaneous PTB3. In general, the vaginal microbiota is stable throughout pregnancy with the dominant species rarely changing17,18; however, biogeography and ethnicity can play a considerable role in determining the apparent ‘normal’ microbiota19. A study of 90 women, mainly of African American decent, found that there was no association between the vaginal microbiota and PTB20. In contrast, DiGiulio et al.21 found that changes in the vaginal populations of Caucasian women were correlated with PTB. Further studies have provided additional evidence that race is a key factor when identifying patterns linking PTB with vaginal microbiota composition22,23.

The common means of categorizing the vaginal microbiome has been to group them into community state types (CSTs). CSTs dominated by Lactobacillus species have generally been associated with positive health states for the woman24. In contrast, CST-4, which is reduced in lactobacilli and contains an increased abundance of mixed species, has been associated with poorer health outcomes. These mixed species, including Gardnerella vaginalis (assigned as Bifidobacterium vaginale in the Genome Taxonomy Database (GTDB)), Atopobium vaginae (assigned as Fannyhessea vaginae in the GTDB), Mobiluncus sp., Prevotella sp., and others, are most often associated with BV. Multiple studies have reported a twofold increased risk for PTB in women suffering from BV25,26. Despite this, large cohort studies have reported that CST-4 can also be present in a high percentage of the healthy population with no symptoms of BV27. Indeed, B. vaginale has also been frequently isolated from asymptomatic women28.

Determining a role for the vaginal microbiota in influencing pregnancy outcome has been limited by the ability to fully differentiate taxonomic groups. This resolution may prove important if discrete communities, species, or strains prove to represent a risk factor. In light of this, it is important to consider that the use of standard 16S rRNA targeted amplicon sequencing has been reported to under-report certain subgroups of B. vaginale29, while targeting the cpn60 gene, has shown an improvement in terms of identifying Bifidobacterium30. Shotgun metagenomics presents an opportunity to overcome many of the limitations of amplicon sequencing, with the additional benefit of understanding the functional potential of a community. An increasing number of studies have now used this approach to elucidate the vaginal microbial communities. In particular, functional distinctions among different metagenomic assembled B. vaginale isolated from the same sample31 and an ability to sub-speciate important taxa32 provide examples for the power of this approach.

In a cohort of women with a predisposed risk for PTB, we aimed to use shotgun metagenomics to distinguish any vaginal microbial signatures different to a low-risk control group.


Participant data

A total of 57 participants with singleton pregnancy were successfully recruited over the study period, 20 at low risk of PTB and 37 with risk factors for PTB. Multiple swabs were collected from some women depending on clinic visit frequency, accounting for 89 total samples. To control for multiple sampling of some woman and different trimester timepoints, we focused our analysis on single samples for each woman in the second trimester of pregnancy (n = 49). Of these, 8 pregnancies ultimately were preterm (PTB) and the remaining 41 were full-term birth (FTB). In addition, for the 35 participants initially considered at risk of PTB, 7 women delivered before 37 weeks’ gestation and were grouped into the PTB group (risk_PTB). The remainder were grouped into a risk but full-term group (risk_FTB). For the 14 women at low risk of PTB, 13 delivered at term (no-risk_FTB). The single sample from this control group that delivered prior to 37 weeks was subsequently provided a distinct group of no-risk_PTB. Within the PTB samples, six deliveries were late preterm (32–37 weeks), one was a moderate preterm (31.7 weeks), and one was a preterm at 26 weeks (Table 1). None of the women had preterm premature rupture of membranes. There were no significant differences in the age, race, body mass index, or smoking status of the participants by either the grouping of of risk_PTB, risk_FTB, no-risk_FTB, and no-risk_PTB or the FTB and PTB grouping. Patient demographics are outlined in Table 1. There was a lower birth weight for infants from the PTB compared to FTB (2230.62 g vs. 3646.12 g, respectively; p < 0.001 Student’s t-test).

Table 1 Descriptive statistics of study participants.

Taxonomic analysis

Following quality filtering, a mean of 828,950 high-quality microbial sequencing reads were obtained per sample. The median number of observed species for either the FTB or PTB groups was 168 and 185.5, respectively (Fig. 1a), with the highest number of species observed at 15 for a risk_PTB sample. There was no significant difference in α-diversity measure between either the FTB or PTB groups (Fig. 1a); however, there was an increased diversity of the risk_FTB group compared to the no-risk_FTB group using both Shannon and Simpson measures (Fig. 1b; p = 0.01 and p = 0.003, respectively). In terms of β-diversity, the risk_PTB communities were significantly dissimilar from both the no-risk_FTB and risk_FTB samples (Fig. 1b; p = 0.02).

Fig. 1: Species diversity between study groups.

α-Diversity comparison as measured by Shannon and Simpson index, including the total observed species for either delivery outcome (a) or risk grouping (b). Significant differences as calculated by Wilcoxon’s test are noted with *p < 0.05 or **p < 0.02. The species level community dissimilarity as measured by Bray–Curtis and visualized using PCoA for either delivery outcome (c) or risk grouping (d). Ellipses are generated using stat_ellipse function in R. Significance of group dissimilarity as calculated by PERMANOVA is identified by the given p-value. The top five lactobacilli across all samples are labelled to highlight drivers of variation for clusters with black arrows to indicate the directionality.

Taxonomic classification revealed that lactobacilli were dominant across all groups at 59.13% mean relative abundance (Fig. 2a), with a greater proportion observed in full-term pregnancies (61.86%) compared to preterm (45.11%). This dominance was greatest in the no-risk_FTB group (71.32%) where both Lactobacillus crispatus (50.86%) and Lactobacillus iners (21.27%) were dominant. Within the risk_FTB group, L. crispatus (25.83%), L. iners (14.83%), Lactobacillus gasseri (11.70%), and L. gasseri A (7.30%) were the dominant species detected. L. crispatus was never >0.03% relative abundance in the risk_PTB group. In addition, L. gasseri was only detected in two samples in the risk_PTB group, accounting for just 0.01% mean relative abundance. Although L. iners (30.82%) accounted for a greater proportion of the mean population in the risk_PTB group compared to both FTB groups, this increase was not statistically significant. Due to the large variation in both the abundance and number of species observed between samples, L. crispatus (p < 0.001), L. gasseri (p = 0.028), and Bifidobacterium breve (p = 0.036) were the only species that significantly differed between full-term and preterm groups, with a higher respective mean relative abundance (Fig. 2b). A significant negative correlation was observed between PTB samples and both L. crispatus and L. gasseri (p-value 0.03 and 0.03, respectively), and after correcting for multiple comparisons, these q-values were below than 0.25 (Fig. 2c, Table 2, and Supplementary Table 2). Overall, the dominance of BV-associated bacteria was low across all samples. Using the GTDB database, B. vaginale (this database assigns G. vaginalis as B. vaginale) was assigned as multiple sub-species. The two most abundant BV-associated species across all samples were B. vaginale and B. vaginale_G; however, neither were significantly increased in either FTB or PTB samples (Fig. 2b). A significant correlation with PTB samples was observed for B. vaginale_D, _E, and _F (Fig. 2c, Table 2, and Supplementary Table 2; q-value 0.097, 0.006, and 0.002, respectively). In addition, F. vaginae and F. vaginae_A (this database assigns A. vaginae as F. vaginae) were positively correlated with PTB samples (q-value 0.137 and 0.008, respectively; Fig. 2c, Table 2, and Supplementary Table 2).

Fig. 2: Species composition across study groups.

a The relative abundance for the top 30 species across all samples for each of the four risk groupings. b Comparitive analysis within the delivery outcomes for the ten most abundant species across all samples. Significant differences were calculated using Student’s t-test. c MaAsLiN analysis correlating species multiple metadata fixed effects. Heatmap displays the top 50 species with a significant assoaction to either fixed effect with q-value < 0.25.

Table 2 MaAsLiN2 multivariate correlation analysis of most abundant microbial species and sample groupings.

Resolution of the microbial species into CSTs revealed that six distinct CSTs were present across all swabs (Fig. 3). CSTs were determined by the most dominant species in a cluster as previously described33 with the definition of CST-8 according to Brooks et al.24. CST-1, dominated by L. crispatus, had a weak association overall with delivery outcome (Supplementary Table 1; p = 0.09). There was a weak association between samples from the risk_PTB groups and CST-5 (Supplementary Table 1; p = 0.09). There were five observations of the BV-associated CST-4, three of which were from the risk_PTB group. This CST had a positive association with samples from the risk_PTB group (p = 0.02). Neither CST-3 nor CST-8 were associated with any pregnancy outcome.

Fig. 3: Community state types of all samples.

Relative heatmap intensity comparison for all species with an across-sample mean relative abundance > 0.2, with sample clustering of similar samples by Pearson’s clustering. Community state types are defined based on the most abundant species per sample and are indicated by the top bar of the figure.

Functional pathways analysis

A total of 2928 gene functions were identified across all samples. Stratification of the data into the three Gene Ontology classifications of cellular component (CC), biological process (BP), and molecular functions (MFs) revealed functional diversity between the risk groupings. Both Shannon and Simpson diversity indexes determined there was a significant increase in the diversity of gene functions for both BPs and metabolic functions for PTB samples compared to FTB (Fig. 4a). When samples were stratified into risk groupings, this difference in functional diversity was not observed between the risk_FTB and no-risk_FTB groups (Supplementary Fig. 1a). The FTB and PTB samples were also functionally dissimilar to each other by Bray–Curtis measure both in the BP and MF classes (Fig. 4b; p-value 0.038 and 0.045, respectively). From ADONIS analysis, only L. gasseri was significantly influencing the variaton in gene function across all three categories (p-value 0.017 (MF), 0.027 (CC), and 0.013 (BP)). Stratification of the samples into the risk groupings did not show any significant differences in diversity of function (Supplementary Fig. 1b). Using multivariate analysis, 22 CCs, 154 BPs, and 299 MFs were significantly correlated to either women who had a previous PTB, large loop excisions of the transformation zone (LLETZ), were categorized with a prior risk for PTB, subgrouped to the four risk categories according to this study, or ultimately delivered preterm (Fig. 5 and Supplementary Table 3; q-value < 0.25 after multiple correction). For gene functions with significant association to preterm delivery, the MF category contained the most differential features. Among these were genes involved in ‘2′ 3′-bisphosphoglycerate-independent phosphoglycerate mutase activity’, ‘receptor activity’, ‘aldose 1 epimerase activity’, ‘carboxyl or carbamoyltransferase activity’, and ‘copper exporting ATPase activity’ (Fig. 5 and Supplementary Table 3). The top five gene functions most associated with preterm delivery within the BP category included ‘respiratory electron transport chain’, ‘alginic acid biosynthetic process’, ‘ATP hydrolysis coupled proton transport’, ‘folic acid biosynthetic process’, and ‘glucose catabolic process’ (Fig. 5 and Supplementary Table 3).

Fig. 4: Functional diversity of the vaginal microbiome.

Gene functions are divided into three higher-level functional categories based on Gene Ontology classification with both α- (a) and β-diversity (b) measures of identified content shown. Significant differences are highlighted by an asterisk where p-value was <0.05 as calculated by Kruskal–Wallis. Significant differences in β-diversity were determined by PERMANOVA analysis. Ellipses are generated using stat_ellipse function in R. The top seven species across all samples are labelled to highlight drivers of variation for clusters with black arrows to indicate the directionality.

Fig. 5: Multivariate association analysis.

Each heatmap presents the independent significant associations of species to grouping as determined by MaAsLiN2 with q-value < 0.25. For biological process and molecular function on the top 50 associations are presented.


In summary, using a shotgun metagenomics approach, this study has confirmed the association of a vaginal microbiome dominated by L. crispatus with full-term pregnancies. In addition, evidence is provided that CST-4 is more associated with the vaginal microbiome from a spontaneous PTB. This study also identified functional differences in the vaginal microbiome of women who subsequently deliver preterm.

To date, numerous microbiome sequencing studies have been conducted with the aim of understanding the vaginal microbiota and its’ association with PTB. A limiting factor in several of these studies has been the use of targeted amplicon sequencing. In particular, it has been shown that certain key vaginal taxa can be underrepresented or not detected, depending on the gene region targeted. Of note, BV has been determined as a risk factor for PTB, yet the detection and differentiation of B. vaginale (G. vaginalis) subtypes can be difficult with the potential to omit subtle nuances when associating vaginal communities with PTB.

In our study, we have used shotgun metagenomics to determine the vaginal communities in women at high risk of PTB. This approach has enabled a higher resolution of the species of bacteria associated with high-risk pregnancies compared to previous approaches. The relatively high microbial read depth achieved, coupled with the recently updated GTDB has revealed a richness of species that are present at low abundance (<5%; Fig. 3). With use of the GTDB classification database, several species of B. vaginale were detected with each determined to correlate differently with pregnancy outcome (Table 2 and Fig. 2c). Importantly, these sub-species were not the dominant B. vaginale, which when present (18/49 occurences) had a mean relative abundance of 14%. In contrast, B. vaginale_D (9/49 occurences), B. vaginale_E (8/49 occurences), and B. vaginale_F (8/49 occurences), which significantly correlated with PTB outcome, were never >0.2% relative abundance. At such low abundance, it remains to be determined whether these species can exert a meaningful biological influence within the microbiota. Nonetheless, this difference within the B. vaginale group highlights the importance of stratifying this species when investigating associations with vaginal health and perinatal outcome. Another important member of BV-associated bacteria, F. vaginae (A. vaginae), was identified in three of the PTB samples at relatively high abundance (4.42–8.10%), yet was rare in full-term samples at a maximum relative abundance of 3% (in 3/41 samples). This species has been shown to develop strong biofilms with B. vaginale34,35. Previous studies identified high loads of this species as a risk for PTB36,37. Of interest in our study, there was an increased relative abundance of F. vaginae in the control group sample that had no perceived risk of PTB, yet delivered preterm (Fig. 3).

The species Sneathia amnii and Prevotella amnii have recently been identified as emerging candidates for poor pregnancy outcomes38,39. In our study, both of these were increased and had a significant association with PTB (q-value 0.02 and 0.05, respectively; Table 2). Although BVAB-1 (assigned UBA629 sp005465875 in GTDB) has been identified in microbiome studies as a potential problematic species in terms of vaginal health, this species was not identified in this study, perhaps reflective of the cohort studied as it has predominantly been reported in North American cohort studies29,40,41. Overall, however, the BV-associated CST, CST-4, was positively associated with PTB, a finding that is in agreement with previous studies in this area21,42.

Unlike previous reports, we did not see any significant differences in α-diversity between the samples from preterm and full-term pregnancies; however, there was a weak correlation between L. crispatus and full-term pregnancy, and an overall increased abundance of this species in samples from full-term pregnancies. This association is in agreement with several other studies and supports the concept that this species is a beneficial component of a healthy vaginal microbiome. Taken together, these studies may suggest a role of L. crispatus in providing protection from PTB, particularly in a Caucasian population. However, a mechanistic basis for such a role has yet to be revealed. Nonetheless, L. crispatus has previously been identified as a species that dominates a healthy vaginal tract, specifically in subjects free from BV43. It has also been linked to the stability of the vaginal microbiota during pregnancy44. Given that inflammation of the uterus has been linked with PTB45, there may be merit to assessing the ability of L. crispatus to provide a protective effect by suppressing inflammation through H2O2 signalling to control nuclear factor-κB activity46 or, indirectly, whereby the inhibition of BV-associated bacteria prevents the occurrence of a proinflammatory response to the vaginosis47. Importantly, a strong correlation with proinflammatory cytokines and BV-associated bacteria was observed by Fettweis et al.39 in a preterm cohort, further highlighting a potential microbial inflammatory mechanism for preterm onset. In general, lactobacilli have been regarded as a marker of a healthy vaginal microbiome and have been seen to improve pregnancy outcome even in the presence of risk associated taxa37. An exception to this is L. iners, which has previously been shown to associate with PTB in a similar cohort48. This observation was not repeated in the study.

Although these findings are important indications that merit repeated investigation, there are limitations to this dataset. In particular, due to the relatively low ratio of PTB samples, a much larger sample size will be necessary to investigate the link between the CSTs, species, and preterm outcome. Moreover, this study’s findings are specific to a Caucasian population, whilst ethnicity has been previously shown as major confounder in describing the vaginal microbiome. Notably, Romero et al.20 found that the composition of the vaginal microbiota did not differ between PTB and FTB across a cohort of predominately African American women. The difference in ethnicity in terms of vaginal microbiota and PTB has been further evidenced recently in a study that indicated that the frequency of Lactobacillus, Gardnerella, and Ureaplasma varies significantly between preterm and full-term deliveries within a Caucasian cohort but not within and African American cohort49. Unfortunately, socioeconomic data was not collected as part of this work, and as has been previously reported50,51 could be a confounder in the different microbiome profile observed. In addition, there were women in both the risk_FTB and risk_PTB groups that had received antibiotic treatment in the 6 months prior to sample collection, yet due to the low numbers is was not possible to determine the influence of this.

Although there is growing evidence for a microbiota-related role in spontaneous PTB, and indeed numerous descriptions of the importance for the microbiota with general vaginal health, there has been a limited understanding of the functional capacity of these microbial communities. As suggested by Heintz-Buschart and Wilmes52, a functional insight is required to develop and test hypothesis for mechanisms of action for host microbe interactions. Indeed, the sensitivity of community analysis using functional metagenomics has already been shown to distinguish healthy populations from cases of inflammatory bowel disease53,54, type 2 diabetes55, and obesity56. Within our preterm cohort, there was a clear distinction with respect to the functional profile of genes involved in both metabolic functions and BPs. The PTB samples had an increased abundance of functions relating to methionine/homocysteine and folate metabolism in addition to purine metabolism. Notably, maternal folate and homocysteine levels have been associated with poor pregnancy outcomes including PTB57. Moreover, there has been reductions in the concentrations of cysteine as measured in cervico-vaginal fluid from preterm samples in a previous study58. Taken together, it would suggest that further research into methionine/folate/cysteine homeostasis is merited in terms of a role for microbial metabolism and host interaction.

Overall, this study demonstrates the benefits of using shotgun metagenomics to understand the vaginal microbiome of women at risk of PTB. Moreover, we have shown that use of a microbial DNA enrichment kit is a feasible method to increase high-quality microbial sequencing reads to the upper limit of what has been achieved in previous studies (Supplementary Table 4) and overcome the notable problem of host contamination31,59. A distinct variation in both the taxonomic and functional potential of the preterm linked vaginal microbiome reinforces the concept of a microbiome role in PTB. However, the inability of both this study and others to determine a strict signature highlights the need for much larger controlled studies with the ability to examine confounding factors like ethnicity, biogeography, and predetermined risk factors using high-resolution, strain level analysis. This study provides evidence for the functional role of the microbiota in spontaneous PTB. Direct gene expression studies are now required if we are to fully elucidate any microbial–host interactions, which are involved in the spontaneous onset of PTB.


Study participants and sampling

This was a prospective cohort study with institutional ethics approval by the National Maternity Hospital Research Ethics Committee and maternal written consent. Women were determined to be at risk of spontaneous PTB if they had either a history of previous spontaneous PTB, and therefore at high risk of subsequent spontaneous PTB (n = 29), or had two previous LLETZ (n = 11). High-risk participants in the study were recruited from women attending the PTB clinic at The National Maternity Hospital Dublin, Ireland. Anonymized patient data were collected from patient charts by an independent researcher and pregnancy outcome data collected from patient charts and a computerized database in the National Maternity Hospital, Dublin. Low-risk controls (n = 14) were women who had previously delivered a full-term baby with no prior history of PTB or LLETZ, and were attending routine antenatal care. Inclusion criteria were women over 18 years of age and pregnant. Exclusion criteria were women currently on antibiotic treatment. Following informed consent, a high vaginal swab was taken using a speculum from the posterior vaginal fornix and external orifice of the cervix prior to transvaginal ultrasonography using a dry cotton swab and delivered to the laboratory within 24 h.

DNA extraction/purification

One millilitre of sterile phosphate-buffered saline was added to each swab followed by rigorous vortexing for 1 min. Total microbial DNA was extracted using the MoBio PowerFood Microbial DNA isolation kit and manufacturer’s instructions. Briefly, 700 μl of swab material was centrifuged and the resulting pellet was disrupted by mechanical and enzymatic treatment. DNA was eluted at 100 μl in elution buffer and samples were stored at −20°C. DNA concentrations were determined using the Qubit high sensitivity kit as per manufacturer’s instructions. To reduce sequencing reads mapping to eukaryotic DNA downstream, DNA from each swab was treated with the Microbiome Enrichment kit (NEB) and purified with AMPure magnetic beads (Beckman Coulter). A negative sterile water control was included during the enrichment and carried forward through subsequent sequencing steps. Finally, all samples were normalized to 0.2 ng µl−1.

Shotgun metagenomic sequencing

Libraries of DNA were prepared according to standard Illumina protocols. Briefly, DNA was sheared by heating to 55 °C for 7 min. Paired-end indexes were added and amplification occurred for 12 cycles before samples were purified with AMPure magnetic beads. DNA was quantified by Qubit dsDNA HS assay kit and the Agilent Bioanalyser 2100 with high sensitivity DNA chips before being pooled to 2 mM. Quality of the sample pool was confirmed by quantitative PCR. A sample of sterile water was processed in parallel with the DNA during library preparation to act as a negative control (Supplementary Table 5). Libraries were sequenced using 2 × 150 bp paired-end kit on the Illumina NextSeq platform.

Bioinformatic analysis

Raw sequencing data were base-called using Illumina’s bcl2fastq software (v 2.19) ( TrimGalore (v 0.6.0) (, a perl wrapper for Cutadapt (v. 1.18)60 and FastQC (v. 0.11.8) (, was used to remove adapter sequences and low-quality sequences using default parameters. Removal of human DNA contamination was performed by aligning all high-quality paired-end reads to the latest draft of the human genome (hg38) using Bowtie2 (v. 2.3.4)61. The resulting SAM files were converted to BAM format and filtered to keep only unmapped paired-end reads using SAMtools62. Bedtools63 was used to convert the remaining reads from BAM to FASTQ format. Taxonomic assignment of paired-end reads was performed using Kraken264 alignment against the GTDB_54k database created by Méric et al.65 ( Functional profiling was performed using the HUMAnN2 pipeline (v. 2.8.1)66. The gene families output were renormalized as copies per million reads and regrouped according to Gene Ontology terms.

Data were visualized using both Graphpad Prism 6 and RStudio (R v.3.6.0). Heatmaps were generated using the ‘ComplexHeatmap’ package67 with samples clustered using the Pearson’s distance metric and columns split by k-means clustering to visualize CSTs, assigned based on previous reported definitions24. Plots for diversity analysis were generated using the ggplot2 package ( Statistical analysis was carried out in R using the vegan package ( and RVAideMemoire ( Multivariate Association with Linear Models 2 (MaAsLiN2, R V.1.2.0) was used to determine independent associations of species and functions with metadata factors. A q-value of >0.25 was considered significant in our analysis.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Sequence data have been deposited in the European Nucleotide Archive (ENA) under the study accession number PRJEB34536.


  1. 1.

    Liu, L. et al. Global, regional, and national causes of child mortality in 2000–13, with projections to inform post-2015 priorities: an updated systematic analysis. Lancet 385, 430–440 (2015).

    Article  Google Scholar 

  2. 2.

    Blencowe, H. et al. National, regional, and worldwide estimates of preterm birth rates in the year 2010 with time trends since 1990 for selected countries: a systematic analysis and implications. Lancet 379, 2162–2172 (2012).

    Article  Google Scholar 

  3. 3.

    Vinturache, A. E., Gyamfi-Bannerman, C., Hwang, J., Mysorekar, I. U. & Jacobsson, B. Maternal microbiome - a pathway to preterm birth. Semin. Fetal Neonatal Med. 21, 94–99 (2016).

    Article  Google Scholar 

  4. 4.

    Khashu, M., Narayanan, M., Bhargava, S. & Osiovich, H. Perinatal outcomes associated with preterm birth at 33 to 36 weeks’ gestation: a population-based cohort study. Pediatrics 123, 109–113 (2009).

    Article  Google Scholar 

  5. 5.

    Pike, K. C. & Lucas, J. S. A. Respiratory consequences of late preterm birth. Paediatr. Respir. Rev. 16, 182–188 (2015).

    Google Scholar 

  6. 6.

    Twilhaar, E. S. et al. Cognitive outcomes of children born extremely or very preterm since the 1990s and associated risk factors: a meta-analysis and meta-regression. JAMA Pediatr. 172, 361–367 (2018).

    Article  Google Scholar 

  7. 7.

    Frey, H. A. & Klebanoff, M. A. The epidemiology, etiology, and costs of preterm birth. Semin. Fetal Neonatal Med. 21, 68–73 (2016).

    Article  Google Scholar 

  8. 8.

    Englund-Ögge, L. et al. Maternal dietary patterns and preterm delivery: results from large prospective cohort study. BMJ 348, g1446 (2014).

    Article  Google Scholar 

  9. 9.

    Moore, E., Blatt, K., Chen, A., Van Hook, J. & DeFranco, E. A. Relationship of trimester-specific smoking patterns and risk of preterm birth. Am. J. Obstet. Gynecol. 215, 109-e1 (2016).

    Article  Google Scholar 

  10. 10.

    Faucher, M. A., Hastings-Tolsma, M., Song, J. J., Willoughby, D. S. & Bader, S. G. Gestational weight gain and preterm birth in obese women: a systematic review and meta-analysis. BJOG 123, 199–206 (2016).

    CAS  Article  Google Scholar 

  11. 11.

    Dole, N. et al. Maternal stress and preterm birth. Am. J. Epidemiol. 157, 14–24 (2003).

    CAS  Article  Google Scholar 

  12. 12.

    Waldenström, U., Cnattingius, S., Vixner, L. & Norman, M. Advanced maternal age increases the risk of very preterm birth, irrespective of parity: a population-based register study. BJOG 124, 1235–1244 (2017).

    Article  Google Scholar 

  13. 13.

    Goldenberg, R. L., Culhane, J. F., Iams, J. D. & Romero, R. Epidemiology and causes of preterm birth. Lancet 371, 75–84 (2008).

    Article  Google Scholar 

  14. 14.

    Jeffcoat, M. K. et al. Periodontal infection and preterm birth: results of a prospective study. J. Am. Dent. Assoc. 132, 875–880 (2001).

    CAS  Article  Google Scholar 

  15. 15.

    Romero, R., Dey, S. K. & Fisher, S. J. Preterm labor: one syndrome, many causes. Science 345, 760–765 (2014).

    CAS  Article  Google Scholar 

  16. 16.

    Romero, R. et al. The role of inflammation and infection in preterm birth. in. Semin. Reprod. Med. 25, 21–39 (2007). Copyright© 2007 by Thieme Publishers, Inc., 333 Seventh Avenue, New York, NY 10001, USA.

    CAS  Article  Google Scholar 

  17. 17.

    Walther-António, M. R. S. et al. Pregnancy’s stronghold on the vaginal microbiome. PLoS ONE 9, 1–10 (2014).

    Article  CAS  Google Scholar 

  18. 18.

    Romero, R. et al. The composition and stability of the vaginal microbiota of normal pregnant women is different from that of non-pregnant women. Microbiome 2, 1–19 (2014).

    Article  Google Scholar 

  19. 19.

    MacIntyre, D. A. et al. The vaginal microbiome during pregnancy and the postpartum period in a European population. Sci. Rep. 5, 1–9 (2015).

    Article  CAS  Google Scholar 

  20. 20.

    Romero, R. et al. The vaginal microbiota of pregnant women who subsequently have spontaneous preterm labor and delivery and those with a normal delivery at term. Microbiome 2, 1–15 (2014).

    Article  Google Scholar 

  21. 21.

    DiGiulio, D. B. et al. Temporal and spatial variation of the human microbiota during pregnancy. Proc. Natl Acad. Sci. USA 112, 11060–11065 (2015).

    CAS  Article  Google Scholar 

  22. 22.

    Brown, R. et al. Role of the vaginal microbiome in preterm prelabour rupture of the membranes: an observational study. Lancet 387, S22 (2016).

    Article  Google Scholar 

  23. 23.

    Nelson, D. B., Shin, H., Wu, J. & Dominguez-Bello, M. G. The gestational vaginal microbiome and spontaneous preterm birth among nulliparous African American women. Am. J. Perinatol. 33, 887–893 (2016).

    Article  Google Scholar 

  24. 24.

    Brooks, J. P. et al. Changes in vaginal community state types reflect major shifts in the microbiome. Microb. Ecol. Health Dis. 28, 1303265 (2017).

    Google Scholar 

  25. 25.

    Leitich, H. et al. Bacterial vaginosis as a risk factor for preterm delivery: a meta-analysis. Am. J. Obstet. Gynecol. 189, 139–147 (2003).

    Article  Google Scholar 

  26. 26.

    Leitich, H. & Kiss, H. Asymptomatic bacterial vaginosis and intermediate flora as risk factors for adverse pregnancy outcome. Best. Pract. Res. Clin. Obstet. Gynaecol. 21, 375–390 (2007).

    Article  Google Scholar 

  27. 27.

    Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl Acad. Sci. USA 108, 4680–4687 (2011).

    CAS  Article  Google Scholar 

  28. 28.

    Janulaitiene, M. et al. Prevalence and distribution of Gardnerella vaginalis subgroups in women with and without bacterial vaginosis. BMC Infect. Dis. 17, 1–9 (2017).

    Article  CAS  Google Scholar 

  29. 29.

    Albert, A. Y. K. et al. A atudy of the vaginal microbiome in healthy Canadian women utilizing cpn 60-based molecular profiling reveals distinct Gardnerella subgroup community state types. 1–21, (2015).

  30. 30.

    Chaban, B. et al. Characterization of the vaginal microbiota of healthy Canadian women through the menstrual cycle. Microbiome 2, 1–12 (2014).

    Article  Google Scholar 

  31. 31.

    Goltsman, D. S. A. et al. Metagenomic analysis with strain-level resolution reveals fine-scale variation in the human pregnancy microbiome. Genome Res. 28, 1467–1480 (2018).

    CAS  Article  Google Scholar 

  32. 32.

    Serrano, M. G. et al. Racioethnic diversity in the dynamics of the vaginal microbiome during pregnancy. Nat. Med. 25, 1001–1011 (2019).

    CAS  Article  Google Scholar 

  33. 33.

    Fettweis, J. M. et al. The vaginal microbiome and preterm birth. Nat. Med. 25, 1012–1021 (2019).

    CAS  Article  Google Scholar 

  34. 34.

    Hardy, L. et al. A fruitful alliance: the synergy between Atopobium vaginae and Gardnerella vaginalis in bacterial vaginosis-associated biofilm. Sex. Transm. Infect. 92, 487–491 (2016).

    Article  Google Scholar 

  35. 35.

    Hardy, L. et al. Unravelling the bacterial vaginosis-associated biofilm: a multiplex Gardnerella vaginalis and Atopobium vaginae fluorescence in situ hybridization assay using peptide nucleic acid probes. PLoS ONE 10, 1–16 (2015).

    Google Scholar 

  36. 36.

    Bretelle, F. et al. High Atopobium vaginae and Gardnerella vaginalis vaginal loads are associated with preterm birth. Clin. Infect. Dis. 60, 860–867 (2015).

    CAS  Article  Google Scholar 

  37. 37.

    Elovitz, M. A. et al. Cervicovaginal microbiota and local immune response modulate the risk of spontaneous preterm delivery. Nat. Commun. 10, 1–8 (2019).

    CAS  Article  Google Scholar 

  38. 38.

    Gentile, G. L. et al. Identification of a cytopathogenic toxin from Sneathia amnii. J. Bacteriol. 202, 1–11 (2020).

    Article  Google Scholar 

  39. 39.

    Fettweis, J. M. et al. The vaginal microbiome and preterm birth. Nat. Med. 25, 1012–1021 (2019).

    CAS  Article  Google Scholar 

  40. 40.

    Holm, J. B. et al. Ultrahigh-throughput multiplexing and sequencing of >500-base-pair amplicon regions on the Illumina HiSeq 2500 platform. mSystems 4, 1–10 (2019).

    Article  Google Scholar 

  41. 41.

    Holm, J. B. et al. Comparative metagenome-assembled genome analysis of Lachnovaginosum genomospecies, formerly known as BVAB1. Preprint at, (2019).

  42. 42.

    Freitas, A. C., Bocking, A., Hill, J. E. & Money, D. M. Increased richness and diversity of the vaginal microbiota and spontaneous preterm birth. Microbiome 6, 117 (2018).

    Article  Google Scholar 

  43. 43.

    Srinivasan, S. et al. Temporal variability of human vaginal bacteria and relationship with bacterial vaginosis. PLoS ONE 5, e10197 (2010).

    Article  CAS  Google Scholar 

  44. 44.

    Verstraelen, H. et al. Longitudinal analysis of the vaginal microflora in pregnancy suggests that L. crispatus promotes the stability of the normal vaginal microflora and that L. gasseri and/or L. iners are more conducive to the occurrence of abnormal vaginal microflora. BMC Microbiol. 9, 116 (2009).

    Article  Google Scholar 

  45. 45.

    Elovitz, M. A., Wang, Z., Chien, E. K., Rychlik, D. F. & Phillippe, M. A new model for inflammation-induced preterm birth: the role of platelet-activating factor and Toll-like receptor-4. Am. J. Pathol. 163, 2103–2111 (2003).

    CAS  Article  Google Scholar 

  46. 46.

    Voltan, S. et al. Lactobacillus crispatus M247-derived H2O2 acts as a signal transducing molecule activating peroxisome proliferator activated receptor-γ in the intestinal mucosa. Gastroenterology 135, 1216–1227 (2008).

    CAS  Article  Google Scholar 

  47. 47.

    Mirmonsef, P. et al. The effects of commensal bacteria on innate immune responses in the female genital tract. Am. J. Reprod. Immunol. 65, 190–195 (2011).

    CAS  Article  Google Scholar 

  48. 48.

    Kindinger, L. M. et al. The interaction between vaginal microbiota, cervical length, and vaginal progesterone treatment for preterm birth risk. Microbiome 5, 1–14 (2017).

    Article  Google Scholar 

  49. 49.

    Callahan, B. J. et al. Replication and refinement of a vaginal microbial signature of preterm birth in two racially distinct cohorts of US women. Proc. Natl Acad. Sci. USA 114, 9966–9971 (2017).

    CAS  Article  Google Scholar 

  50. 50.

    Virtanen, S. et al. Vaginal microbiota composition correlates between Pap smear microscopy and next generation sequencing and associates to socioeconomic status. Sci. Rep. 9, 1–9 (2019).

    CAS  Article  Google Scholar 

  51. 51.

    Dunlop, A. L. et al. Stability of the vaginal, oral, and gut microbiota across pregnancy among African American women: the effect of socioeconomic status and antibiotic exposure. PeerJ 2019, e8004 (2019).

    Article  Google Scholar 

  52. 52.

    Heintz-Buschart, A. & Wilmes, P. Human Gut Microbiome: Function Matters. Trends Microbiol. 26.7, 563–574 (2018).

    Article  CAS  Google Scholar 

  53. 53.

    Armour, C. R., Nayfach, S., Pollard, K. S. & Sharpton, T. J. A metagenomic meta-analysis reveals functional signatures of health and disease in the human gut microbiome. mSystems 4, 1–15 (2019).

    Article  Google Scholar 

  54. 54.

    Schirmer, M. et al. Dynamics of metatranscription in the inflammatory bowel disease gut microbiome. Nat. Microbiol. 3, 337–346 (2017).

    Article  CAS  Google Scholar 

  55. 55.

    Qin, J. et al. A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature 490, 55–60 (2012).

    CAS  Article  Google Scholar 

  56. 56.

    Chatelier, E. Le et al. Richness of human gut microbiome correlates with metabolic markers. Nature 500, 541–546, (2013).

  57. 57.

    Bergen, N. E. et al. Homocysteine and folate concentrations in early pregnancy and the risk of adverse pregnancy outcomes: the Generation R Study. BJOG 119, 739–751, (2012).

  58. 58.

    Ghartey, J., Bastek, J. A., Brown, A. G., Anglim, L. & Elovitz, M. A. Women with preterm birth have a distinct cervicovaginal metabolome. Am. J. Obstet. Gynecol. 212, 776.e1–776.e12 (2015).

    Article  Google Scholar 

  59. 59.

    Marotz, C. A. et al. Improving saliva shotgun metagenomics by chemical host DNA depletion. Microbiome 6, 1–9 (2018).

    Article  Google Scholar 

  60. 60.

    Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 17, 10–12 (2011).

    Article  Google Scholar 

  61. 61.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357 (2012).

    CAS  Article  Google Scholar 

  62. 62.

    Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  CAS  Google Scholar 

  63. 63.

    Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS  Article  Google Scholar 

  64. 64.

    Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 1–13 (2019).

    Article  CAS  Google Scholar 

  65. 65.

    Méric, G. et al. Correcting index databases improves metagenomic studies. bioRxiv (2019).

  66. 66.

    Franzosa, E. A. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat. Methods 15, 962–968 (2018).

    CAS  Article  Google Scholar 

  67. 67.

    Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).

    CAS  Article  Google Scholar 

Download references


This publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under Grant Numbers SFI/12/RC/2273 and 16/SP/3827, and an Irish Research Council fellowship (EPSPD/2016/25) with Alimentary Health Ltd.

Author information




Conceptualization: C.F., D.C., S.H., F.M., and P.C. Ethical committee application: D.C., S.H., and F.M. Experimental investigation: C.F. and E.L. Bioinformatics and statistics: C.F., C.W., and P.C.. Writing—original draft: C.F. Writing—review and editing: C.F., D.C., C.W., S.H., F.M., and P.C. Samples collection and data curation: C.F., D.C., and S.H. Figures preparation: C.F. Funding acquisition: C.F., F.M., and P.C. F.M. and P.C. are joint senior authors. All authors discussed the results and commented on the manuscript.

Corresponding authors

Correspondence to Fionnuala M. McAuliffe or Paul D. Cotter.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Feehily, C., Crosby, D., Walsh, C.J. et al. Shotgun sequencing of the vaginal microbiome reveals both a species and functional potential signature of preterm birth. npj Biofilms Microbiomes 6, 50 (2020).

Download citation


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing