Extensive research and development efforts have been made to investigate the epidemiological risk factors and biological mechanisms that underlie preterm birth (PTB), to test preventive interventions, and to optimize care for babies born prematurely. However, very few approaches have achieved widespread feasibility and impact, and the burden of PTB has remained high in many regions worldwide1. Mothers in sub-Saharan Africa and South Asia face the highest risk of early delivery and often lack access to life-saving neonatal interventions1. Maternal HIV infection, which complicates as many as 1 in 4 pregnancies in parts of Africa2, can increase the risk of preterm delivery by as much as 50%3. Counterintuitively, women with HIV who initiate antiretroviral therapy (ART) before pregnancy may have even higher rates of preterm delivery than those who start ART during pregnancy4. The biological mechanism(s) linking the virus and its treatment to PTB remain poorly understood.

PTB is a final common outcome of several distinct pathways that can be broadly delineated by those that follow spontaneous labor or pre-labor membrane rupture and those initiated by a provider due to high-risk maternal or fetal conditions. Many spontaneous PTB (sPTB) phenotypes can be attributed to systemic or local infection, inflammation, or immune activation processes. Few studies to date have characterized the effect of the vaginal microbiome and immune factors on sPTB5,6, and limited data exist comparing these factors by maternal HIV serostatus and ART exposure6,7,8,9,10,11. We hypothesize that the elevated risk of preterm delivery faced by women with HIV may be related to measurable differences in the vaginal microbial and immune milieu during pregnancy. In previous analyses, we demonstrated that women with HIV, and particularly those who had not yet started ART at the time of conception, had higher vaginal bacterial diversity and vaginal inflammatory markers12,13.

In this follow-up analysis employing whole community metagenomic sequencing and parallel cytokine assays, we investigate whether characteristics of the vaginal microbiome and local immune response are correlated and predict sPTB in women with and without HIV. Metagenomes afford higher resolution characterization of the composition of the vaginal microbiome and clustering both on taxonomic (species) and functional (genes) attributes14. Elucidating biological processes by which HIV infection may incite spontaneous preterm labor and delivery could inform the development of effective therapeutic interventions for the prevention of HIV-related prematurity and its consequences.


Study design

The Zambian Preterm Birth Prevention Study (ZAPPS; Identifier: NCT02738892) is an ongoing prospective antenatal cohort at the Women and Newborn Hospital of the University Teaching Hospitals (UTH) in Lusaka. Full study procedures have been described previously15. Briefly, ZAPPS participants are enrolled prior to 24 gestational weeks and receive comprehensive antenatal care, laboratory testing, biological specimen collection, and ultrasound to establish gestational age. All deliveries prior to 37 gestational weeks were classified as provider-initiated or spontaneous by an on-site obstetrician (JP). sPTB was defined as delivery prior to 37 gestational weeks preceded by spontaneous preterm labor or pre-labor spontaneous rupture of membranes. Preterm labor inductions and pre-labor cesarean deliveries were considered provider-initiated preterm births and excluded from this analysis.

We defined exposures as HIV serostatus at cohort enrollment (HIV− vs. HIV+) and as a 3-level variable that further distinguished HIV+ participants by viral load suppression (i.e., undetectable vs. detectable). We determined HIV infection at enrollment by screening all participants using the SD Bioline 3.0 test (SD Biostandard Diagnostics, India) and confirming positive cases with Determine HIV-1/2 Ag/Ab Combo test (Alere Inc., Waltham, MA). Viral load assays were conducted using the Abbott RealTime HIV-1 Assay (Abbott Molecular, Des Plaines, IL)16,17,18.


The University of Zambia Biomedical Research Ethics Committee and the University of North Carolina Institutional Review Board each granted approval to conduct the ZAPPS study and for protocol-related specimen testing. All research was performed in accordance with relevant guidelines and regulations. Participants in ZAPPS provided individual written informed consent before undergoing study procedures.

Specimen collection

In ZAPPS, mid-vaginal dry polyester swabs collected at enrollment at subsequent antenatal visits are stored on-site at − 80 °C. Initial specimen selection for vaginal microbiome and cytokine analysis has been explained in detail previously12,13. Briefly, to maximize available budget, we used a double sampling technique that selected for participation all eligible HIV+ participants and all HIV− participants who delivered spontaneously prior to 37 gestational weeks. Among HIV− women who delivered at term, we selected participants at random at a proportion determined by available resources. Finally, we also analyzed repeat specimens collected between 24 and 36 gestational weeks from a random subset of HIV+ participants. All staff who performed laboratory analyses were blinded to baseline characteristics and clinical outcomes.

DNA isolation, sequencing, & bioinformatic analysis

As described in detail previously12, we performed whole genome shotgun (WGS) sequencing of bacterial DNA extracted from vaginal swabs. To summarize, genomic DNA was isolated from vaginal swab samples, processed with the Nextera XT DNA Library Preparation Kit (Illumina), purified with Agencourt AMPure XP Reagent, and sequenced on an Illumina HiSeq 4000 system.

Sequencing output from the Illumina HiSeq platform was converted to FASTQ format and demultiplexed using Illumina Bcl2Fastq conversion software (Illumina). Quality control of the demultiplexed sequencing reads were verified with FastQC software (Babraham Institute, Cambridge, UK). Sequencing reads originating from the host were removed using BMTagger (v3.101)19 and the human reference genome GRCh38/hg3820. The data were then further processed using sortmeRNA (v2.1b) to remove ribosomal RNA sequencing reads21 and then Trimmomatic (v0.3653) for quality filtering/trimming22. The remaining reads were then mapped to the VIRGO non-redundant gene catalog using Bowtie23 (v1.2.2) as described previously14. Gene length corrected mapping results from VIRGO were used to establish the taxonomic composition of the vaginal microbiota24 (Supplemental Table). Within-species diversity of these microbiota was established for only the four most prevalent bacteria and only included samples with at least 90% of the genes of the taxon’s genome: L. crispatus (n = 18), L. iners (n = 115), Gardnerella (n = 152), and A. vaginae (n = 85). Briefly, the patterns of gene content for each taxon were subjected to hierarchical clustering using Bray–Curtis dissimilarity and Ward linkage. For each taxon two clusters were identified each representing a group of genes (i.e., a set of strains) co-occurring frequently in this dataset. These clusters were previously coined metagenomic subspecies14. Because only Gardnerella metagenomic subspecies demonstrated differences in incidence of sPTB, only this taxon’s metagenomic subspecies were integrated in hierarchical clustering of samples using Bray–Curtis dissimilarity and Ward linkage, where the total abundance of Gardnerella in each sample was assigned to either metagenomic subspecies Gardnerella type 1 or Gardnerella type 2. For samples that did not meet our threshold for Gardnerella genome coverage to be included into the metagenomic subspecies analysis, the Gardnerella relative abundance was assigned to “Gardnerella other”. Considering fluctuating taxonomy within the Gardnerella genus and for clarity of comparisons between metagenomic subspecies identified in our analysis and species of other genera, herein we refer to these three Gardnerella metagenomic types as “metagenomic subspecies”. Statistical support for seven within-study metagenomic clusters was found using silhouette scores. Species level composition for Gardnerella was determined for each sample using VIRGO25. Shannon diversity index (SDI), a measure of species richness and evenness (alpha diversity), was calculated for each sample as a sum of each individual species’ proportional abundance multiplied by the natural logarithm of this same proportion26.

Inflammatory marker analysis

A second vaginal swab collected at the same timepoints was analyzed by multiparameter bead array for interleukin-1β (IL-1β), IL-2, IL-6, IL-8, IL-10, IL-12p70, interferon-ɣ (IFN-ɣ), IFN-ɣ-induced protein 10 (IP-10), tumor necrosis factor-α (TNF-α), macrophage inflammatory protein-1α (MIP-1α), MIP-1β, and transforming growth factor-β (TGF-β) by Milliplex (Millipore-Sigma, Minneapolis, MN) and secretory leukocyte protease inhibitor (SLPI) and soluble CD14 (sCD14) by single-analyte ELISA (R&D Systems, Minneapolis, MN). As described in detail previously13, we used confirmatory factor analysis to derive inflammatory scores for each sample based on activity of the measured pro-inflammatory markers in vaginal swabs with standardized factor loadings at a significance level of p > 0.01 (IL-1β, IL-2, IL-6, IL-8, IL-12p70, IP-10, TNF-α, MIP-1β, and sCD14).

Statistical analysis

We analyzed baseline demographic data of the sub-study cohort, calculating median and interquartile range (IQR) for each continuous variable, and frequency and percentage for each categorical variable. We compared categorical and continuous variables between primary outcome groups using chi-square and linear regression, applying probability weighting to account for sampling technique.

We described overall and relative taxonomic abundances, species diversity, individual inflammatory biomarkers, and inflammatory scores, quantifying their associations with HIV and viral suppression in unadjusted models. Inverse probability sampling weights were applied to account for unequal sampling technique, with robust standard errors computed by the linear variance estimator27.

Correlations between relative abundances of key bacterial taxa and inflammatory markers were estimated using Spearman’s rank-order correlation coefficient (rho). A weighted linear regression model was built to estimate the association between individual inflammatory markers and SDI among samples collected between 16–20 gestational weeks, adjusting for HIV and detectable viral load at baseline. All inflammatory markers assayed were included in the initial model and then sequentially removed by stepwise backward elimination.

We calculated median SDI and inflammatory scores of baseline samples in each metagenomic cluster and estimated the relationship of each compared to L. crispatus-dominated mgClust7 using weighted linear regression. Similarly, SDI and inflammatory scores were calculated for baseline samples among each Gardnerella metagenomic subspecies classification. We reported linear associations of Shannon diversity and vaginal inflammation in samples with Gardnerella type 1 and type 2 compared to the “other” Gardnerella subtype overall and among subgroups of HIV serostatus.

We identified optimal cutpoints for dichotomizing mean relative abundances of key taxa that maximized the product of sensitivity and specificity to predict the outcome of sPTB using the method proposed by Liu28. Associations between vaginal microbiota and inflammation by sPTB were calculated as unadjusted and adjusted prevalence ratios using Poisson regression with robust error variance29,30. Because twin gestation and short cervical length (< 2.5 cm) are strong independent predictors of sPTB and rare in our cohort, we excluded women with these risk factors from analyses of this outcome. Multivariable models were adjusted for potential confounding due to maternal age, body mass index (BMI), parity, prior PTB, as well as HIV serostatus (in baseline analyses) or detectable viral load (for matched repeat analyses).


Baseline characteristics

Of 221 participants who contributed vaginal specimens to this secondary analysis, 192 (87%) delivered at term and 29 (13%) experienced sPTB (Table 1). Women with sPTB had higher prevalence of prior preterm birth, twin gestation, short cervical length, and vaginal bleeding in pregnancy. Of 85 (38%) participants with HIV at enrollment, 50 (59%) had initiated ART prior to conception, and 39 of those (78%) had undetectable viral load. Median gestational age at first sample collection was 18 weeks (IQR: 17–19). Matched cytokine analysis in baseline samples was available for 207 (94%) participants. We performed repeat microbiome analysis of swabs collected at 32 weeks (IQR: 29–32) among 66 HIV+ participants selected at random, of whom 47 (71%) also underwent a concurrent repeat cytokine analysis (Fig. 1).

Table 1 Characteristics of participants with vaginal specimens analyzed in ZAPPS cohort, N = 221.
Figure 1
figure 1

Flowchart of ZAPPS participants with term or spontaneous preterm birth (sPTB) included in vaginal microbiome and cytokine analyses at baseline (16–20 gestational weeks) and repeat (24–36 gestational weeks) timepoints.

Taxonomic prevalence, abundances, and metagenomic clustering

A total of 201 unique bacterial species were identified by taxonomic profiling of the metagenomic sequence dataset. The most common bacterial taxa across all 287 specimens collected was L. iners, present in all samples (relative abundance IQR: 2%–77%) and all subspecies of the Gardnerella genus, present in 99% of samples (relative abundance IQR: 2%–52%). The study-wide mean relative abundance across samples of L. iners was 35%, of Gardnerella was 30%, of L. crispatus was 11%, of Prevotella species was 10%, of A. vaginae was 4%, and of Candidatus (Ca.) Lachnocurva vaginae (formerly BVAB131) was 3%.

Two distinct metagenomic subspecies within the Gardnerella genus were identified in our samples. The main difference between these metagenomic subspecies involved two recently described, and closely related Gardnerella species: G. swidsinkii and G. leopoldii25. The first Gardnerella subspecies was characterized by a higher proportion of G. swidsinkii/G. leopoldii while the second contained a more diverse array of Gardnerella spp. and a low proportion of G. swidsinkii/G. leopoldii (mean proportion of G. swidsinkii/G. leopoldii was 48% vs 4%, respectively; Fig. 2). A third profile contained a lower overall relative abundance of Gardnerella such that it precluded assignment to a metagenomic subspecies. We refer to these metagenomic subspecies as Gardnerella types 1, type 2, and “other”, respectively.

Figure 2
figure 2

Proportions of assignable Gardnerella abundance in two distinct profiles of metagenomic subspecies of Gardnerella present in vaginal specimens collected between 16–20 weeks and 24–36 weeks, N = 152.

Median SDI across all 287 samples was 0.93 (IQR: 0.47–1.50) and among baseline samples was 0.86 (IQR: 0.44–1.51). Among matched samples from 66 HIV+ participants who also had a later sample analyzed, SDI was similar between baseline (median 1.20, IQR: 0.59–1.65) and repeat (median 1.05, IQR: 0.64–1.49; p = 0.7) specimen collection timepoints.

Based on clustering of all samples, 7 major metagenomic clusters (mgClust) were identified (Fig. 3): mgClust1 was dominated by L. iners (mean relative abundance 89%); mgClust2 by a mix of L. iners (54%) and Gardnerella “other” (26%); mgClust3 by Gardnerella type 1 (70%); mgClust4 by P. bivia, A. vaginae, and Gardnerella (a mix of all three metagenomic subspecies), mgClust5 by L. iners, Gardnerella type 2, and Ca. Lachnocurva vaginae; and mgClust6 by Gardnerella type 2, L. iners, and P. bivia. Finally, mgClust7 comprised predominantly L. crispatus (86%) with minor co-occurrence of other lactobacilli.

Figure 3
figure 3

Heat map of vaginal microbial composition of all specimens collected between 16–20 weeks 24–36 weeks, N = 287.

Microbiome characteristics by HIV status

Presence of the Gardnerella “other” metagenomic subspecies was most common in baseline samples collected from women without HIV (n = 77/136, 57%), intermediate in those collected from HIV+ women with undetectable virus (n = 16/44, 36%; p = 0.03), and least common among those collected from women with detectable virus (n = 10/41, 24%; p = 0.001). Conversely, the presence of Gardnerella type 2 was most common in baseline samples from women without HIV (n = 30/136, 22%), intermediate in those collected from HIV+ women with undetectable HIV (22/44, 50%; p = 0.001), and least common among those collected from women with detectable virus (n = 24/41, 59%; p < 0.001).

The relative distribution of vaginal specimens across metagenomic clusters at baseline also differed by HIV serostatus and viral load (Fig. 4). Compared to participants without HIV, women with HIV overall had higher prevalence of microbiota dominated by Gardnerella type 2 and other mixed anaerobes in mgClust5 (17% vs. 6%; p = 0.02) and mgClust6 (27% vs. 11%; p = 0.002), and markedly lower prevalence of the L. crispatus-dominant mgClust7 (4% vs. 23%; p = 0.001). While women with HIV had modestly higher prevalence of mgClust4 compared to those without HIV (11% vs. 4%; p = 0.05), this was driven by the subset of HIV+ participants with detectable virus who had substantially higher relative prevalence of mgClust4 (15%; p = 0.02). Consistent with our previous results, median SDI at baseline was higher in specimens collected from HIV+ women with undetectable viral load (1.17, IQR: 0.51–1.66; p = 0.01) and highest in those with detectable virus (median 1.31, IQR: 0.85–1.66; p < 0.001) compared to those without HIV (0.74, IQR: 0.35–1.26). Baseline vaginal inflammation scores were statistically similar between HIV+ (median: 0.82, IQR: − 0.39–1.08) compared to HIV− participants (median: 0.40, IQR: − 0.74–1.02; p = 0.3).

Figure 4
figure 4

Prevalence of metagenomic community cluster (mgClust) by HIV status and viral load (VL) at 16–20 gestational weeks. Relative percents weighted for sampling and p values calculated by weighted logistic regression with HIV− as comparator group.

Vaginal inflammation and microbiome characteristics

In vaginal specimens collected at 16–20 weeks, moderate positive correlations were noted between log-transformed concentrations of IL-1β, IL-10, and sCD14 and relative abundances of Gardnerella type 2, A. vaginae, and P. bivia, while negative correlations were found with L. crispatus and Gardnerella “other (Table 2). Conversely, a moderate negative correlation existed between SLPI and relative abundance of Gardnerella type 2 and A. vaginae, and positive correlations with Gardnerella “other” and L. crispatus.

Table 2 Correlation between key relative bacterial abundances and log cytokine concentrations, represented as Spearman rank-order coefficients (rho). Bolded coefficients represent moderate correlation (i.e., |.3| to |.5|) between bacterial abundance and cytokine concentration and with p < 0.001, adjusted for multiple comparisons by Bonferroni correction.

IL-1β, IL-10, and sCD14 concentrations were associated with higher SDI, while IL-2, IL-6, IL12p70, and SLPI were each modestly associated with lower SDI (Table 3). Inflammatory scores increased with SDI (coeff+ 0.66, 95%CI 0.28, 1.03; p = 0.001); both were highest among specimens in mgClust2, mgClust4, mgClust5, and mgClust6 (Fig. 5), and were moderately correlated overall (rho + 0.3; p < 0.001). Compared to L. crispatus-dominated mgClust7, both SDI and inflammatory scores were significantly higher in specimens of anaerobe-abundant mgClust2 through mgClust6, but only modestly higher in specimens of L. iners-dominated mgClust1 (Table 4). Both SDI and inflammatory scores were lowest in samples with Gardnerella “other” and significantly higher in those with Gardnerella type 2 (Table 5). The highest SDI and inflammatory scores were noted in samples collected from participants with Gardnerella type 2, regardless of HIV status.

Table 3 Shannon Diversity Index by log vaginal cytokine concentrations at 16–20 gestational weeks, N = 206.
Figure 5
figure 5

Median Shannon diversity indices (SD) and inflammatory scores (IS) by metagenomic clusters (mgClust) in vaginal specimens collected between 16–20 gestational weeks. p values calculated by weighted linear regression of SD and IS in each mgClust compared to L. crispatus-dominant mgClust 7, adjusting for HIV serostatus at enrollment.

Table 4 Shannon diversity indices and inflammatory scores by metagenomic cluster (mgClust) at 16–20 gestational weeks.
Table 5 Shannon diversity indices and inflammatory scores by Gardnerella metagenomic subspecies at baseline at 16–20 gestational weeks.

Effect of microbiome and inflammation on sPTB

In empirical analysis of optimal cutpoints among baseline samples, mean relative abundance of L. iners above 26% (prevalence ratio, PR 2.5; 95% CI: 1.2, 5.2; p = 0.02), Gardnerella “other above 0.3% (PR 2.5; 95% CI 1.2, 5.4; p = 0.02), and Gardnerella type 2 above 50% (PR 2.6; 95% CI 1.1–6.4; p = 0.03) were each associated with sPTB and remained so in models adjusted for HIV and viral load. Whereas mean relative abundances of L. iners and Gardnerella “other among baseline samples were similar by maternal HIV status, compared to samples collected from participants without HIV (10%), women with HIV had higher mean relative abundance of Gardnerella type 2 (23%; p = 0.001). Although no cutpoint of Gardnerella type 1 was found to protect against sPTB, mean relative abundances were higher in samples from women who delivered at term (14%) compared to sPTB (3%; p < 0.001) and trended higher among women without HIV (15%) compared to those with HIV (10%; p = 0.08). Finally, mean relative abundances of L. crispatus were similar between samples from term (13%) compared to sPTB (15%; p = 0.9) but were lower among women with HIV (3%) compared to those without (20%; p < 0.001).

Among 192 participants who delivered at term, 10 (5%) had baseline specimens in mgClust2 compared to 5 (33%) among 16 who had sPTB between 34–36 weeks (PR 6.9; 95% CI 2.6, 18.3; p < 0.001) (Fig. 6). When restricting the outcome to sPTB < 34 weeks, 14 (6%) participants met this more severe outcome, of whom 5 (36%) had baseline specimens in mgClust6 (PR 2.6; 95% CI 1.2, 5.8; p = 0.02).

Figure 6
figure 6

Prevalence of metagenomic clusters (mgClust) at 16–20 gestational weeks among participants with term birth, spontaneous preterm birth at 34–36 weeks (sPTB 34–36), and spontaneous preterm birth before 34 weeks (sPTB < 34). Relative percents weighted for sampling and p values calculated by weighted Poisson regression of prevalence of mgClust between preterm birth outcomes compared to term; * p < 0.05; ** p < 0.001.

Baseline vaginal inflammatory scores were higher among participants who experienced sPTB (median 0.90, IQR: 0.03–1.28) compared to those who delivered at term (median 0.50, IQR: − 0.73–1.04) (Table 6). The prevalence of sPTB increased with higher vaginal inflammatory scores at baseline in models weighted for sampling and adjusted for maternal age, BMI, parity, prior PTB, and HIV serostatus (APR 2.8, 95% CI: 1.5, 5.2; p = 0.001). Baseline SDI were similar between those who experienced sPTB and those who delivered at term.

Table 6 Shannon diversity index and vaginal inflammatory scores at baseline (16–24 weeks) and the change (Δ) from baseline to repeat (24–36 gestational weeks) between participants experiencing term birth and spontaneous preterm birth (sPTB).

Microbial stability and inflammatory changes among HIV+ women

Among 66 participants with HIV who contributed vaginal specimens for vaginal microbiome characterization at a second timepoint (median 32 weeks; IQR: 29–32), 32 (48%) had undetectable viral load at enrollment and had initiated ART before pregnancy. Of the 34 (52%) with detectable virus, 5 (15%) had started ART before pregnancy. Between baseline specimens collected at 16–20 weeks and repeat specimens, median change in SDI was statistically similar among the 37 participants who had initiated ART before pregnancy (+ 0.11; IQR: − 0.28–0.48) compared to the 29 who had not (− 0.17; IQR: − 0.41–0.25; rank sum p = 0.5). Vaginal inflammation overall decreased in women who had started ART before pregnancy (median -0.33, IQR: − 0.97–0.34) while it increased in participants who had not (median 0.31, IQR: − 0.22–0.90; rank sum p = 0.02).

Nine (14%) participants with repeat vaginal samples analyzed delivered spontaneously before term. SDI increased between baseline and repeat collection timepoints among women who went on to have a sPTB (median + 0.18, IQR: 0.11–0.21) while it decreased among those who delivered at term (median − 0.11, IQR: − 0.54–0.37). In adjusted models, increasing SDI predicted sPTB (APR 2.5; 95%CI 1.1, 5.6). Vaginal inflammatory scores increased between baseline and repeat specimens modestly more among participants who experienced sPTB (median + 0.37, IQR: − 0.07–0.48) compared to those who delivered at term (median − 0.11; IQR: − 0.81–0.72), but confidence intervals were wide and included the null in weighted multivariable models.


We employed metagenomic sequencing of the vaginal microbiome and assays of local inflammatory markers to investigate the relationships between the microbiome and sPTB in a cohort of pregnant women with and without HIV in Zambia. Our analysis confirms a high prevalence of diverse, anaerobe-rich microbiota in our cohort overall, and particularly among women with HIV. In this analysis, vaginal samples were classified into 7 distinct types based on clustering metagenomic content, whose distribution varied by maternal HIV and, in some cases, by viral suppression status. Two groups dominated by Gardnerella and L. iners species predicted sPTB. We identified two Gardnerella metagenomic subspecies (a group of co-occurring strains/species of Gardnerella defined by gene content) whose prevalence varied by maternal HIV serostatus and viral suppression, were associated with varying levels of microbial diversity and vaginal inflammation, and differentially predicted sPTB. These Gardnerella metagenomic subspecies were distinguished by the proportion of G. swidsinkii/G. leopoldii, two closely related Gardnerella subspecies25. Gardnerella metagenomic subspecies type 1, dominated by G. swidsinkii/G. leopoldii, was modestly less abundant both in women with HIV and those who delivered sPTB. In contrast, Gardnerella metagenomic subspecies type 2, notable for a highly diverse mix of other subspecies of Gardnerella, was more abundant among women with HIV and those who delivered sPTB. Vaginal microbiota dominated by L. crispatus, although very uncommon among women with HIV and rare overall, did not confer the anticipated protective effect against preterm birth as presented in other cohorts32. Finally, we described multiple correlations between vaginal microbiome and inflammatory markers and found that both vaginal inflammation among all participants at baseline and an increase in microbial diversity through pregnancy among a subset with HIV were associated with sPTB.

Similar to other studies during and outside of pregnancy, pregnant women with HIV in ZAPPS exhibited more diverse and anaerobe-rich vaginal microbiota compared to women without HIV. However, a preponderance of literature in sub-Saharan Africa demonstrates that it is Lactobacillus-deficient, anaerobe-rich vaginal microbiota that confers a higher susceptibility to HIV infection itself, such that it remains unclear whether HIV is a cause or effect of increased microbial diversity and vaginal inflammation in pregnancy. The uncertain causal relationship between HIV infection and the vaginal microbiome is further complicated by broader shifts towards Lactobacillus dominance mediated by a physiologic estrogen excess during pregnancy, which may be disrupted by HIV-related chronic inflammation, immune reactivation with ART initiation, or certain antiretroviral agents. Although our analysis was limited in size, women who had started ART prior to conception had a reduction in vaginal inflammation throughout pregnancy compared to those who had not13, while the change in alpha diversity (SDI) trended in the opposite direction. This may indicate that ART initiation activates local inflammatory pathways with either minimal or modest benefit on the microbial milieu, but longitudinal analyses of vaginal microbiota from preconception through pregnancy, and from pre- and post-ART initiation, are needed to confirm this hypothesis. Furthermore, whereas nearly all participants with HIV in ZAPPS were taking efavirenz-based regimens, dolutegravir-based ART is now recommended as first line instead such that future research will need to address any differential effects between these exposures.

In previous reports derived from the same ZAPPS cohort, we described associations between HIV and sPTB33, HIV and anaerobe-rich microbiota12, ART initiation and vaginal inflammation13, and vaginal inflammation among women who experienced sPTB13. In this analysis that employed updated metagenomic sequencing, linked the vaginal microbiome to inflammation, and analyzed associations between the microbiome and birth outcomes, we found correlations between vaginal inflammatory markers and microbiome and demonstrated that metagenomic characteristics more common among women with HIV were associated with inflammation and sPTB. Using 16S rRNA gene sequencing to characterize the vaginal microbiota, Gudza-Mugabe and colleagues reported that, whereas pregnant women living with HIV in Zimbabwe had higher prevalence of diverse, anaerobe-rich microbiota, and moderate correlations were noted between certain bacterial taxa and vaginal cytokines, no association was found between HIV and inflammation, and the higher risk of preterm birth among women with HIV was independent of the vaginal microbiota. In contrast, we found characteristics of the vaginal microbiome and inflammation predicted sPTB even after adjusting for HIV serostatus and viral suppression. Methodological differences may limit direct comparison between the Gudza-Mugabe report and our study, including methods of estimating gestational age, gestational ages at assessment, distinctions in viral load suppression and ART initiation timing, and the higher resolution in Gardnerella speciation by metagenomic sequencing. Furthermore, HIV differentially increases the risk of spontaneous over provider-initiated preterm deliveries such that risk estimates and associations may be blunted when examining PTB overall33; this may partly explain the null results reported by others.

The direct causes of sPTB are often unknown, but overt infection leading to inflammation and immune activation are common antecedents. In concert with findings among pregnant and non-pregnant women6,34,35,36,37, we found moderate correlations between vaginal inflammation and the composition of the vaginal microbiome at baseline, and independent associations between the vaginal microbiome and risk of sPTB. However, over half of our participants had anaerobe-rich Lactobacillus-deficient type of vaginal microbiota and a similar proportion experienced sPTB as those with microbiota dominated by L. crispatus, commonly considered protective and anti-inflammatory. Although we noted a trend in L. iners-dominant communities found more commonly in women with late sPTB between 34 and 36 weeks compared to G. vaginalis-rich communities in women with earlier sPTB < 34 weeks, additional studies with larger sample sizes are needed to confirm whether vaginal microbiome composition differentially predicts severity of prematurity. Furthermore, clear geographic variations in common vaginal microbial characteristics and the associated risk of preterm birth highlight the need for population-specific approaches to classifying the vaginal microbiome and to identifying women at highest risk who will most benefit from preventive therapies5,7,38,39,40,41. In our population, relative abundance of metagenomic subspecies of Gardnerella may better convey risk or protection than uncommon Lactobacillus species or community state type classifications. Further examination of the functional make-up of L. crispatus in this cohort and others where it was found protective is warranted.

We acknowledge several limitations to this analysis. Because of the nature of our observational cohort that relied on standard antenatal care practices, we could not investigate the role of sexually transmitted infections other than HIV and syphilis and we were limited by some missingness in baseline covariates. Additionally, we did not collect data on recent antibiotic use, which could differ by HIV serostatus but would likely bias our findings toward no association; nonetheless, recent antibiotic use was conceivably uncommon in the baseline cohort of samples collected at the first antenatal visit. Due to our sample selection procedure, the characteristics of this nested study are not directly generalizable to the full cohort. Although we weighted analyses to account for sample selection using inverse probabilities, we are currently undertaking more comprehensive analyses in a much larger sample of our cohort population to better characterize the interactions between HIV and ART, the vaginal microbiome and inflammation, and adverse birth outcomes. Similarly, due to limited funding, the current analyses were unable to investigate longitudinal differences between participants with and without HIV. The use of metagenomic sequencing is a strength in this study and afforded a higher resolution of Gardnerella which 16S rRNA gene sequencing cannot provide.

In summary, pregnant women in Zambia have high prevalence of anaerobe-rich vaginal microbiota correlated with local inflammation, and women with HIV exhibit characteristics of the vaginal microbiome associated with spontaneous preterm birth. These findings suggest the risk of preterm birth faced by women with HIV may be mediated by the vaginal microbial and inflammatory environment and could be a target for novel preventive therapies aimed at restoring a protective vaginal microenvironment. However, since many women with diverse vaginal microbiota and inflammation still deliver at term and certain species and subspecies differentially confer risk across populations, additional research is needed to identify women who would most benefit from intervention, to define how risk is modified by other host factors, and to tailor interventions to population and individual risk profiles.