Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

# molBV reveals immune landscape of bacterial vaginosis and predicts human papillomavirus infection natural history

## Abstract

Bacterial vaginosis (BV) is a highly prevalent condition that is associated with adverse health outcomes. It has been proposed that BV’s role as a pathogenic condition is mediated via bacteria-induced inflammation. However, the complex interplay between vaginal microbes and host immune factors has yet to be clearly elucidated. Here, we develop molBV, a 16 S rRNA gene amplicon-based classification pipeline that generates a molecular score and diagnoses BV with the same accuracy as the current gold standard method (i.e., Nugent score). Using 3 confirmatory cohorts we show that molBV is independent of the 16 S rRNA region and generalizable across populations. We use the score in a cohort without clinical BV states, but with measures of HPV infection history and immune markers, to reveal that BV-associated increases in the IL-1β/IP-10 cytokine ratio directly predicts clearance of incident high-risk HPV infection (HR = 1.86, 95% CI: 1.19-2.9). Furthermore, we identify an alternate inflammatory BV signature characterized by elevated TNF-α/MIP-1β ratio that is prospectively associated with progression of incident infections to CIN2 + (OR = 2.81, 95% CI: 1.62-5.42). Thus, BV is a heterogeneous condition that activates different arms of the immune response, which in turn are independent risk factors for HR-HPV clearance and progression. Clinical Trial registration number: The CVT trial has been registered under: NCT00128661.

## Introduction

Bacterial vaginosis (BV) is defined as vaginal dysbiosis with inflammation and accompanying symptoms including vaginal discharge1,2,3. According to Centers for Disease Control and Prevention (CDC) and NHANES studies, the prevalence of BV is 29.2% amongst reproductive-aged women living in the United States4,5. Globally this condition is estimated to have an economic burden of approximately \$5-billion per year6,7.

In addition to its ubiquity, BV is a urogenital condition that has been associated with adverse reproductive health outcomes including infertility8, increased risk for pre-term birth9, and low birth weights10. Moreover, an active state of BV is associated with an elevated risk for transmission of a variety of sexually transmitted infections (STIs) ranging from bacterial pathogens such as Chlamydia11 and Mycoplasma12, to viral agents including HIV13,14 and human papillomavirus (HPV)15. There is increasing interest in understanding the relationship between the cervicovaginal microbiome (CVM) and HPV natural history and progression to cancer16,17,18,19,20. In fact, differences in the CVM might explain why some high-risk HPV (HR-HPV) infections resolve, while others persist and progress. Lastly, BV is also associated with non-reproductive health issues such as obesity21.

Clinical BV is primarily diagnosed using Amsel criteria22, which requires the presence of three out of four signs or symptoms: (1) homogeneous, thin, white discharge that smoothly coats the vaginal walls; (2) clue cells in a wet mount; (3) pH of vaginal fluid >4.5; and (4) a fishy odor from the vaginal discharge before or after addition of 10% KOH (i.e., whiff test). Although commonly used, this approach has been widely criticized for a considerable rate of misdiagnosis23. An alternative to Amsel’s criteria is the Nugent score that creates a composite score based on counts of key bacteria morphologically identified on a Gram stain (i.e., Lactobacillus, Gardnerella, and curved Gram-negative rods)24. Although this method is more sensitive than the Amsel criteria24,25,26, it has been shown to suffer from interobserver variability27 and its use has primarily been limited to research settings due to the amount of time, expertise, and costs required to perform the test28. The term, molecular BV has been introduced recently14 and there are various meanings depending on the system used for molecular detection and the correlation with clinical, bacteriologic, and/or microscopic BV2,28. It specifically refers to suboptimal states of the CVM that are usually associated with reduced levels of Lactobacillus as measured by molecular techniques.

Bacterial vaginosis has features of an inflammatory state and is associated with alterations of cervicovaginal cytokines29,30,31,32. A number of studies have reported the association of elevated IL-1β and BV31,32, whereas most immune markers associated with BV appear to differ across studies. It has been proposed that this variability may be due to small sample sizes, heterogeneity of study populations33, and/or differing microbial taxa within the CVM. It is important to identify the source of this variability since the pathogenic effects of BV appear to be associated with local inflammation14,32,34,35,36.

In this study, we describe a 16S rRNA gene amplicon sequencing-based algorithm, called molBV, that can reproducibly categorize BV using a Nugent-like 0-10 score across a variety of populations including those from the US and Africa. Using this molecular approach to identify BV, we report the association of a set of cervicovaginal cytokines with molBV-BV. In particular, we demonstrate that elevated levels of Lactobacillus iners may in part explain the detection of a BV-like inflammatory signature amongst molBV-BV negative women. Although there appears to be a predominant host immune response to molBV-BV, the CVM’s positive associations with alternative forms of inflammation are associated with specific microbial agents. We utilize these observations to explore risk factors for the rate of clearance and progression of oncogenic HPV. We provide evidence that an inflammatory cervical profile underlies the association of HR-HPV natural history with molBV-BV. Surprisingly, the alternative inflammatory pathway is associated with the progression of HR-HPV infections to neoplastic lesions. This study provides evidence of multiple host inflammatory pathways associated with the cervicovaginal microbiome that influence the outcome of cervicovaginal HPV infection and possibly other pathologic outcomes of bacterial vaginosis.

## Results

### Developing a molecular bacterial vaginosis scoring system

Initially, 30 young women with and without symptoms of BV were recruited for evaluation of Nugent, Amsel and 16S amplicon sequencing. Three samples were inadequate for study leaving a training set of 57 participants (mean age = 21, range 15–25 years). Based on Amsel’s criteria, 22 were classified as BV-positive; whereas, Nugent score evaluations categorized 26 with BV, 8 as intermediate for BV, and 23 as inconsistent with BV (Supplementary Table 1).

The 16S rRNA gene V4 region was amplified from cervicovaginal samples of all 57 participants, as it has been shown to robustly detect bacterial species from the cervicovaginal region37. There was an average (SD) of 16,580 (487) 16S reads per sample. Fungal sequencing of an ITS1 region amplicon using recently validated primers38 resulted in an average (SD) of 16,290 (3486) ITS reads per sample. Following taxonomic assignments and clustering by Euclidean distances (Fig. 1A), samples formed two primary clusters that were either defined by a dominance of two major species from the genus Lactobacillus (n = 27) or a state of polymicrobialism (n = 30). There was a highly significant tendency of the BV-positive samples to sort to the polymicrobial clade and BV negative samples to the Lactobacillus clade based on either the Amsel (p < 0.001) or Nugent BV diagnosis (p < 0.001). Hierarchical clustering using fungal communities revealed two primary clades: one dominated by Candida albicans, the other with a dominance of Malassezia restricta (Supplementary Fig. 1). The fungal community clustering showed no significant association with binary BV diagnosis, although some clustering was observed for both the Nugent (p-value = 0.18) and Amsel BV-positive samples (p-value = 0.22).

All alpha diversity measures (Chao1, Fisher, and Shannon, all p < 0.001) were highly associated with both Amsel and Nugent outcomes of BV (Fig. 1B). Beta diversity analyses using Jensen–Shannon Divergence (JSD) distances subsampled to 10,000 reads revealed that both the Amsel and Nugent criteria for BV were also significantly associated with the vaginal microbiome R2 = 0.25, p < 0.001 (Fig. 1C) and R2 = 0.59, p < 0.001 (Fig. 1D), respectively. To identify specific taxa associated with BV using ANCOM, we focused on the samples with concordant results for BV by Amsel and Nugent criteria (Supplementary Table 1); 52 differentially abundant genera were identified (FDR < 0.05, Fig. 1E), with Lactobacillus being the dominant genus elevated in BV-negative women and a mixture of anaerobic Gram-negative bacteria such as Gardnerella elevated in BV-positive, as expected39,40,41. There were no significant associations of fungal alpha or beta measures or specific fungal taxa identified with BV states (Supplementary Fig. 2A–C).

We sought to define a single molecular score from the 16S rRNA gene amplicon next-generation sequencing (NGS) data that would maximize generalizability of such a metric. Thus, we limited the markers to those taxa present in > 80% of all samples at a relative abundance of ≥ 0.001% after subsampling to 10,000 reads. We identified 11 genera meeting these criteria including Lactobacillus, Prevotella, Gardnerella, Megasphaera, Parvimonas, Clostridium, Porphyromonas, Adlercreutzia, Dialister, Atopobium, and Sneathia. We then derived a molBV score using robust regression modeling as described in the “Methods” section and created an averaged score of microbial taxa ratios for each sample providing a score of 0–10 similar to a Nugent score output.

### molBV prediction of Nugent BV using three independent clinical datasets

molBV was evaluated in publicly available datasets that included 16S rRNA gene amplicon NGS data and measures of bacterial vaginosis. One testing set contained 388 American women with available 16S data sequenced using the V1–V2 16S rRNA gene region (different from the V4 region used above) and clinical Nugent scores40. In addition, we identified two African populations, one collected in Cape Town (n = 90) and the other in Soweto (n = 78) that sequenced the V4 region of 16S and had Nugent measures of BV42. We ran the 16S amplicon NGS reads through the molBV pipeline to generate Nugent-like scores and observed a strong correlation between the clinical Nugent scores and the molBV scores in all three cohorts with ICC values between 0.71–0.81 (Supplementary Fig. 3A). We next assessed the molBV score as a discriminant tool for BV diagnostic categories similar to Nugent scores (BV-negative = 0–3 or BV-positive = 7–10)43. The molBV score showed high AUC values (0.88–0.98) in all three datasets and outperformed other measures of the microbiome such as alpha diversity measures Chao1 and Shannon and the relative abundance of Lactobacillus (Supplementary Fig. 3B-D). Thus, the molBV pipeline is a robust tool to convert 16S NGS data into BV categories independent of 16S amplicon region and population characteristics.

### The inflammatory landscape of BV

Previous studies indicated that vaginal dysbiosis is associated with an innate immune response32,44. To further investigate the host immune landscape and bacterial vaginosis, we utilized the molBV tool to recapitulate categories of vaginal dysbiosis where other measures of BV were unavailable. We utilized 431 baseline samples from individual women participating in the placebo arm of the Costa Rica Vaccine Trial (CVT) that had 32 cytokine proteins (i.e., cytokines, chemokines and soluble receptors) quantitated from cervical secretions collected with a sponge (see Methods)45. Using three ordinal categories of BV derived from the molBV scores equivalent to Nugent BV negative (molBV 0–3, n = 179), intermediate (molBV 4–7, n = 70) and positive (molBV 7–10, n = 182), we identified 13 cytokines significantly associated with a trend across the three BV states (Fig. 2A, all markers q < 0.001). Cytokine levels were also tested with respect to age, smoking and HPV16 status and did not show any significant associations (Supplementary Table 2). In order to validate the use of ordinal BV categories, we performed additional sensitivity analyses using categorical BV states and found that the categorical models did not provide a better fit (Supplementary Fig. 4). The strongest positive association of ordinal molBV states was with IL-1β (unit increase OR = 1.73, 95% CI: 1.56–1.92), whereas IP-10 was inversely associated with BV (OR = 0.76, 95% CI: 0.68–0.85). Supplementary Table 3 shows the cytokines associated with molBV-BV in univariate analysis; 6 have been previously associated with BV and 7 additional cytokines are described in this report. Given that inflammation is a complex host response, we analyzed the interrelationships amongst the 32 cytokines to identify patterns of expression. A correlation network was constructed connecting those cytokines with a Pearson correlation >0.6 (see Fig. 2B). Since 18 of the markers show a strong correlation to at least one other marker, we sought to further improve the identification of differentially abundant inflammatory signals by using cytokine ratios that overcome some issues with compositional data and relative abundances46. Figure 2C shows a volcano plot indicating cytokine ratios (in red) with a q < 3.77*10−44 threshold. The ratios with the strongest affects were IL-1β/IP-10 and IP-10/TNF-α, shown at the far right and left, respectively. To further examine the relationships of cytokine ratios, we created a matrix showing the pairwise correlations associated with molBV-BV (Fig. 2D). Six highly correlated ratios share the IL-1β cytokine and 2 other highly correlated ratios share TNF-α (the 1st and 2nd highest ORs for a given cytokine, respectively, in the univariate analysis shown in Fig. 2A). Of the BV associated ratios, IL-1β/IP-10 had the strongest overall effect based on absolute odds ratio. Interestingly, IL-1β and IP-10 have been previously shown to be strongly associated with BV47, making this ratio very attractive for further consideration.

Exploring the distribution of the IL-1β/IP-10 ratio across the molBV-BV ordinal states revealed that despite the clear and consistent association (linear trend p-value = 3 × 10−48) with BV (Fig. 2E), 24% (85/349) of women with molBV-BV did not have elevated IL-1β/IP-10 levels. Whereas, surprisingly 28% (88/309) of BV negative women, had elevated IL-1β/IP-10 levels (see left violin plot, Fig. 2E). We used ANCOM to explore potential causes of molBV-negative women having high IL-1β/IP-10 levels; this analysis identified elevated levels of L. iners and G. vaginalis. Whereas, those with elevated levels of L. crispatus had low levels of IL-1β/IP-10 (W-stat threshold>10, FDR < 0.05, Fig. 2F).

To determine whether women with molBV-BV who did not show an elevated IL-1β/IP-10 signature had an alternative form of BV-associated inflammation, we compared this group to BV-negative women with similarly low levels of IL-1β/IP-10 (i.e., below the cohort median, Fig. 2G). The analysis revealed that TNF-α/MIP-1β was significantly positively associated with molBV-BV (OR = 1.24, 95% CI: 1.19–1.29) in the absence of elevated IL-1β/IP-10.

### Molecular BV, cervical inflammation, and the natural history of HPV

We previously reported that increased diversity of the cervicovaginal microbiome contributed to HPV natural history16. To evaluate the impact of bacterial vaginosis and directly test whether associated inflammation could be mediating the effect of BV on the natural history of HPV, we utilized the previously reported16 prospective cohort sub-study from the CVT trial. We utilized 16S NGS data from cervicovaginal DNA16 to calculate the molBV scores across two study visits (307/431 baseline participants had sequenced 16SV4 and cytokine data). Women who had sustained low levels of molBV vs. those that had sustained high molBV scores were more likely to clear HR-HPV over time (Fig. 3A, p = 0.02). Briefly, sustained levels refer to women that had molBV levels above or below the cohort median for both of the measured visits (203/307 were included). Similarly, sustained high-levels of BV-associated inflammation vs. low, as determined by IL-1β/IP-10, were associated with lower rates of HR-HPV clearance (Fig. 3B, p = 004). Sustained levels of this measure were also defined using stratification by the cohort median and agreement of the measure (above or below) at the two measured visits (183/307 were included). For detailed definitions of sustained-levels of molBV and IL-1β/IP-10 inflammation see the subsection “HPV natural history exposure/outcome definitions” in the “Methods” section of the manuscript.

To determine whether molBV-BV and/or sustained BV-associated inflammation markers acted as independent risk factors for HR-HPV clearance, we used cox-proportional hazard models after covariate adjustment. Table 1 presents the effects of molBV and IL-1β/IP-10 levels adjusted for age, smoking status and HPV16. Model 1 considered the dichotomized molBV states and found that compared to having sustained low levels of molBV (reference), the transition from high (V1) to low (V2) was a significant protective factor against clearing a HR-HPV infection (HR = 0.55, 95% CI: 0.31–0.97). Model 2 considered the effect of both IL-1β/IP-10 and molBV states. In this cytokine-adjusted analysis, molBV levels were not associated with HR-HPV clearance, suggesting that IL-1β/IP-10 was an independent driver of HPV clearance (HR = 1.87, 95% CI: 1.08–3.20). In this context, molBV states were not significant with the exception of the marginal signal from the group that became low at visit 2 (HR = 0.38, 95% CI: 0.15–1.00). Given the strong correlation between molBV and IL-1β/IP-10 levels (Fig. 2E), we made an additional parsimonious model (Model 3) to more accurately measure the effect of sustained low IL-1β/IP-10 levels on increasing the likelihood of HR-HPV clearance (HR = 1.86, 95% CI: 1.19–2.90). This model did not change the hazard ratio of sustained low IL-1β/IP-10 with HR-HPV clearance, but it did significantly reduce the p-value supporting the analyses that it is the true driver of HR-HPV clearance.

We next evaluated the association of BV and HR-HPV progression to CIN2+16. Briefly, the original study considered the binary outcome of persistent HR-HPV progressing to CIN2+ (diagnosed ~2 years after the second visit sample16) vs. HR-HPV infection clearance with CVM components serving as predictors. Using values from V2, we first tested whether molecular BV was associated with progression to CIN2+ using a generalized linear model (Table 2, Model 1). The results indicated that a continuous molBV score was prospectively associated with increased risk for CIN2+ progression with the odds of progression increasing by 1.24 (95% CI: 1.02–1.55) per unit increase of molBV. However, the dominant BV IL-1β/IP-10 signature was not significant when molBV was also in the model (Table 2, Model 2, OR = 1.15, 95% CI: 0.92–1.46). Remarkably, the alternative BV-associated inflammation signature represented by TNF-α/MIP-1β levels was a predictor of progression when either molBV (Table 2, Model 3, OR = 2.71, 95% CI: 1.46–5.61) or molBV and IL-1β/IP-10 were in the model (Table 2, Model 4, OR = 2.65, 95% CI: 1.39–5.68). Given the association of TNF-α/MIP-1β with molBV (Fig. 2C) and the modest correlation with IL-1β/IP-10 (Fig. 2D), an additional parsimonious model was constructed to more accurately measure the effect of TNF-α/MIP-1β with progression to CIN2+ without molBV or IL-1B/IP-10 (Table 2, Model 5, OR = 2.81, 95% CI: 1.65–5.42).

## Discussion

In this study, we develop a relatively simple means to characterize cervicovaginal samples for BV using 16S rRNA gene amplicon next-generation sequencing (NGS) data. It is well known that features of the CVM are strongly associated with BV39,40,48,49,50,51,52, and our study takes this relationship further and provides a quantitative score of 0–10, equivalent to the Nugent score. This method is particularly useful for high-throughput analyses to determine a “molecular” Nugent-like score (molBV-BV) in women that might not have been evaluated for BV, but have an available cervicovaginal sample. Development of this method used a stringent diagnosis of BV including subjects concurrent for BV by both Nugent and Amsel criteria. The method was validated in three additional cohorts suggesting the generalizability of this particular molecular approach to generate a Nugent-like score. Although it is known that there is substantial variation between the cervicovaginal microbiome between African women and women with European ancestry53, use of the molBV algorithm in two African populations revealed high diagnostic AUCs for BV of 0.97 and 0.88 for Soweto and Cape Town sets, respectively. Moreover, the molBV diagnostic was also used to evaluate the local host inflammatory response in a population without available BV measures (i.e., Nugent or Amsel). We identified a total of 13 different cytokines associated with molBV-BV, of which 7, had not been previously reported (Supplementary Table 3 and ref. 32). In addition, we utilize the molBV score and cytokine data to demonstrate the contribution of each to the natural history of HR-HPV infections using a prospective study design. Another interesting feature of this study was to demonstrate a prospective association of TNF-α with CIN2+ (previously reported in a cross-sectional study by Łaniewski et al. 54).

Cytokine ratios were used in order to better address the interrelated data structure and we observed that increasing IL-1β/IP-10 ratios were strongly correlated with increasing molecular BV scores (linear trend p-value = 3.67 × 10−48). IL-1β/IP-10 ratio was previously postulated to be a relevant signature for BV and the identification of women at higher risk of STI transmission47.

Despite a strong correlation between IL-1β/IP-10 and molBV states, there were still 24–28% of women that had an elevated ratio despite a molBV score of 0–3. Upon further analyses, elevated L. iners and G. vaginalis were identified in this enigmatic group. This finding is of interest since dominance of Lactobacillus species in the CVM is typically associated with vaginal health40,49. In the CVT cohort, L. crispatus was inversely associated with IL-1β/IP-10 inflammation consistent with previous reports40. It is of interest that women in the discovery set within a Lactobacillus-dominated clade and having clinical features of BV also had elevated levels of L. iners (Fig. 1A). It is possible that the association of L. iners with a BV-associated inflammatory state is due to strain-level variation since the biology of L. iners is perplexing55,56,57. This particular species of Lactobacillus differs from other members of the genus in many respects including genome characteristics that give it a more perplexing character as compared to other lactobacilli56. Furthermore, certain strains of L. iners appear to carry unique genes, such as those that encode for inerolysin58. France et al.51, postulated that these genes appear to have been horizontally transferred to L. iners from G. vaginalis and this allows certain members of this species to directly extract nutrients from host cells, which may explain how it can persist in a sub-optimal CVM and possibly induce vaginal inflammation. Deeper sequencing of the CVM will be required to validate this hypothesis. Alternatively, the community context of L. iners might influence its behavior and association with inflammation. It appears that specific bacteria could have context specific functionality depending on the observed molecular BV state at the time of sampling. Although the molBV algorithm was set to clinical parameters of BV, the current work suggests that additional stratification to identify women at risk for sub-clinical inflammation is necessary. This is especially important given the presented data showing the significance of these inflammatory shifts in the context of HR-HPV infections and possible implications for other diseases in which BV acts as a risk factor. Lastly, L. iners may reflect associations with other unmeasured determinants leading to BV56.

An alternative consideration that may play a role in the pathogenesis of BV, in addition to bacterial biomarkers is the importance of microbial biomass. Amplicon sequencing of the microbiome is compositional in nature and does not provide a direct means to establish exact microbial biomass46. This variable was previously shown to be relevant with the qPCR technique that allowed intermediate BV states to be resolved by quantifying bacterial loads of G. vaginalis and A. vaginae59. It would be interest to incorporate a quantitative technique such as the one developed by Morton et al. 46, to further expand molecular characterization of BV in future studies.

Another remarkable aspect of the cytokine analysis of this study is the heterogeneity of immune markers that were elevated across the strata of molecular BV states. It would be reasonable to predict that having an elevation of the predominant cytokine ration (i.e., IL-1β/IP-10) would yield similar distributions of other cytokines given the correlation levels observed in Fig. 2B and D. However, when comparing the molecular BV-positive to BV-negative women with low levels of IL-1β/IP-10, we observed that the BV-positive women had elevated levels of TNF-α/MIP-1β inflammatory cytokines. In fact, TNF-α/MIP-1β was the dominant signature elevated in molecular BV-positive women with low levels of IL-1β/IP-10. These results indicate that BV seems to be heterogeneous as to the exact type of sub-clinical host inflammatory response. Given the identifiable variance of inflammatory levels within individual strata of BV due to specific organisms such as L. iners, it is likely that specific taxa, or bacterial networks, might be associated with this observation. Further studies with deeper sequencing to identify the possible microbial genetic basis for these observations are needed.

There is increased interest in uncovering the relationship between the cervicovaginal microbiome and HPV natural history, since a number of studies show a correlation between vaginal microbial diversity, BV, and HPV clearance16,17,18,19,20. It is not currently known why certain high-risk infections clear while a small minority persists for years and eventually progresses to pre-cancer60. We utilized HR-HPV detection data within the Costa Rica vaccine trial cohort45 to evaluate cervical cytokine profiles, 16 S rRNA gene amplicon NGS data and the newly developed molecular BV states to interrogate possible mechanisms of HR-HPV infection clearance. Kaplan-Meier analyses revealed similar clearance patterns amongst women with sustained low vs. high IL-1β/IP-10 levels and low vs. high molBV scores (see Fig. 3), although the cytokine measure showed a stronger association with HR-HPV clearance. To better understand these relationships, we utilized Cox-proportional hazard modeling with adjustment for age, smoking status, and HPV16 infection. The association with molBV-BV was eliminated once we adjusted for an elevated IL-1β/IP-10 ratio, possibly indicating that a specific type of inflammation associated with BV was driving the relationship between HR-HPV persistence or clearance.

Another relevant HPV outcome is a progression of a persistent HR-HPV infection to pre-cancer (CIN2+)58. The CVM was previously reported to be predictive of this outcome16,20,36,61. However, when we tested the continuous IL-1β/IP-10 levels in a model, this cytokine signature did not add any additional information beyond molBV for HR-HPV progression. Surprisingly, when we tested the TNF-α/MIP-1β signature that is also associated with certain characteristics of molBV, we found that it was associated with CIN2+ progression even after adjustment for molBV as well as molBV and IL-1β/IP-10 (Table 2). The final analysis revealed that a single unit increase in the TNF-α/MIP-1β ratio was positively associated with an odds of 2.81 (95% CI: 1.65–5.42) of developing CIN2+ within 2-years of V2 sample collection. The data presented in the current report suggest that BV and the host response is a highly heterogeneous relationship and although BV is consistently associated with certain microbial shifts and overall community structure (e.g., higher alpha diversity), the host response can also be modified by the presence of specific taxa. These results may explain why certain studies do not see an association with CVM diversity (a surrogate for BV), but do see signals when analyzing specific bacteria62. Based on the observations reported in this study, these variations appear to have a substantial effect on the immune response, which in turn has an effect on HR-HPV clearance and progression to CIN2+.

The currently reported analyses have several weaknesses that should be taken into account when interpreting the data. All of the analyses of the CVM utilized 16S rRNA gene amplicon sequencing. This method limits the taxonomic resolution of bacteria and other organisms constituting the microbiome. A deeper exploration of BV using techniques such as shotgun metagenomics may provide a more thorough explanation as to why there is significant heterogeneity in the local host inflammatory response to BV amongst different women. Additionally, although our core analysis utilized compositionally aware approaches, rarefaction was used when calculating beta diversity, which may bias the magnitude of the PERMANOVA result. The relationships between molBV and the analyzed cytokines were considered in a linear context, other non-ordinal relationships might exist and are worthy of future investigation. Moreover, the developed molBV score was dependent on a relatively small set of bacteria and was based on “clean” BV diagnoses in which Amsel and Nugent’s tests agreed. This choice was made in order to facilitate the robustness of the measure. This appears to have been effective based on the associations seen within African populations but may present a limitation in populations with different structures of the CVM, especially ones that may occur in women where the two clinical scoring systems are discordant. Most of the women in the reported study are adolescents or young adults and we do not know if the analyses extend to women of all ages and geographic locations. Moreover, it is possible that other inflammatory signatures might exist in the cervicovaginal region that were not measured in the current report. Finally, in using the clinical Nugent score to guide our analyses we may have inadvertently missed important physiological phenomena of BV that are inherently diluted by this clinical score; future studies should utilize additional CVM reduction techniques such as the recently developed VALENCIA51, which produces community state types that are clinically agnostic.

Here we present a comprehensive molecular characterization of BV using 16S rRNA gene amplicon sequencing and a curated panel of cytokines. We demonstrate using multiple cohorts that 16S amplicon sequencing can be reliably used to diagnose BV employing the newly developed molBV score. We further demonstrate that this score was strongly correlated with a heterogeneous inflammatory landscape within the cervicovaginal region. Exploring these inflammatory markers further revealed a complex system of interactions between individual taxa, specific cytokines, and molecular BV states. In addition, we demonstrated that there is the potential for clinical relevance of the findings through the use of HR-HPV outcomes. We specifically show that different possible inflammatory states in BV are either associated with persistence of HR-HPV infection (i.e., IL-1β/IP-10) or the progression of infection to precancer (i.e., TNF-α/MIP-1β). In support of a role for TNF-α in the progression of HR-HPV infections, a recent report indicated that TNF-α was the main discriminatory biomarker associated with invasive cervical cancer63.

Whether the adverse health outcomes from BV are all based on the host inflammatory response remains to be rigorously evaluated. Deeper exploration of these associations is warranted using more robust techniques such as machine learning in order to further understand why certain women experience inflammation with BV, while others do not and how the host-microbiome relationships impact health. Lastly, the implication of BV inducing a local inflammatory response might imply signaling systemic inflammation, which was not assessed in this study. The role BV-induced inflammation might have in immune conditions more prevalent in women remains an interesting hypothesis that could have profound diagnostic and therapeutic ramifications.

## Methods

### Bacterial vaginosis training set

This component of the study was conducted within an ongoing HPV study at Mount Sinai Adolescent Health Center (MSAHC) in New York City64,65. Cervicovaginal samples were collected from female patients, 15–25 years of age, with vaginal symptoms suggestive of BV (n = 30) or no symptoms (n = 30), both groups were recruited sequentially from the same clinic. Pregnant women were excluded. The parent study and BV sub-study were approved by the Institutional Review Board at The Icahn School of Medicine at Mount Sinai.

### Diagnosis of bacterial vaginosis

Subjects were evaluated for Amsel criteria by the examining physician and the presence of 3 out of 4 Amsel criteria established a diagnosis of BV66. Vaginal swabs were collected for Nugent scores by carefully inserting a sterile swab into the vagina about two inches, gently rotating against the vaginal wall for 10–30 s, and then withdrawn without touching the skin to avoid contamination. De-identified swabs were placed in a plastic culturette tube and shipped overnight to an outside clinical laboratory for Nugent scoring following standardized criteria24. A composite score from 0 to 10 was generated with a diagnosis of Nugent BV assigned as follows: 0–3 was considered negative, 4–6 was considered intermediate, and 7–10 was considered diagnostic of BV.

### Microbiome sample collection and DNA extraction

Samples for microbiome analyses were collected using a Cytobrush® placed in PreservCyt transport medium (ThinPrep®; Hologic, Marlborough, MA). Samples were stored immediately at −20 °C until transport to the research lab at the Albert Einstein College of Medicine. In the lab, the samples were transferred to a 15 ml tube and gently centrifuged at 1500 RPM for 5 min. After removing the supernatant by decanting, the pellets were rinsed in 3 ml of TE (10 mM Tris, 1.0 mM EDTA). This solution was then vortexed and centrifuged at 1500 RPM for 5 min and the supernatant was removed by decanting. The remaining pellet and leftover solution (~150 µl) were used for DNA isolation via column processing with the QIAamp Mini spin column (Qiagen, Valencia, CA) following the manufacturer’s protocol. The purified DNA was eluted in 150 µl of elution buffer (10 mM Tris/0.5 mM EDTA, pH 9).

### PCR amplification

PCR for bacterial communities was performed using forward (515F) GTGYCAGCMGCCGCGGTA and reverse (806R) GGACTACHVGGGTWTCTAAT primers which amplify the V4 hypervariable region of the prokaryotic 16S rRNA gene67,68. All primers contained unique Golay barcodes to allow for dual indexing of each sample. PCRs were conducted in a 25 µl reaction with 2 µl input of template DNA, 16.75 µl of ddH20, 2.5 µl of Platinum 10× PCR buffer (Invitrogen, Waltham, MA), 0.75 µl of MgCl2 (50 mM, Invitrogen), 0.5 µl of dNTP mix (10 mM each, Roche, Basel, Switzerland), 0.25 µl AmpliTaq Gold, polymerase (5 U/µl, Applied Biosystems, Carlsbad, CA), 0.25 µl of Platinum Taq DNA Polymerase (10 U/µl, Invitrogen), and 1 µl (5 µM) of each primer (IDT, Coralville, IA). Thermocycling conditions included an initial denaturation at 95 °C for 5 min, followed by 15 cycles of 95 °C for 1 m, 55 °C for 1 m, 72 °C for 1 m, followed by 15 cycles of 95 °C for 1 m, 60 °C for 1 m, 72 °C for 1 m, and a final extension at 72 °C for 10 min.

PCR for fungal communities was performed using barcoded forward (48F) ACACACCGCCCGTCGCTACT and reverse (217R) TTTCGCTGCGTTCTTCATCG primers that amplify the ITS1 region of the prokaryotic ribosomal gene cluster38,69. PCRs were conducted in a 25 µl reaction with 10 µl input of template DNA, 8.75 µl of ddH20, 2.5 µl of Platinum 10x PCR buffer (Invitrogen), 0.75 µl of MgCl2 (50 mM, Invitrogen), 0.5 µl of dNTP mix (10 mM each, Roche), 0.25 µl AmpliTaq Gold polymerase (5 U/µl, Applied Biosystems), 0.25 µl of Platinum Taq DNA Polymerase (10 U/µl, Invitrogen), and 1 µl (5 µM) of each primer (IDT, Coralville, IA). Thermocycling conditions included an initial denaturation at 95 °C for 5 mins, followed by 35 cycles of 95 °C for 30 s, 55 °C for 30 s, 72 °C for 2 min, followed by a final extension at 72 °C for 10 min. All PCRs were conducted in a GeneAmp PCR System 9700 (Applied Biosystems) and PCR products were verified by gel electrophoresis.

### Next Generation Library preparation and sequencing

PCR products for each sample were pooled by PCR assay (16S and ITS1) in approximately equal concentrations and 100 µl of the pooled products were loaded into a 3% agarose gel and run at 80 V for 3 h to separate the DNA fragments. The DNA fragment for each assay was excised and purified with a QIAquick Gel Extraction Kit (Qiagen) and quantified using a Qubit High Sensitivity dsDNA assay (Invitrogen). NGS library preparation was conducted on the purified pooled PCR samples from each assay with a KAPA LTP Library Preparation Kit (KAPA Biosystems, Wilmington, MA) according to the manufacturer’s protocol. The library amplicons were validated on a 2100 Bioanalyzer (Agilent Technologies, Santa Clara CA) and sequencing of libraries was carried out on an Illumina MiSeq with 2 × 300 bp paired-end reads kit at the Genomics Core of the Albert Einstein College of Medicine.

### Bioinformatics

Illumina reads were initially right trimmed to remove bases that fell below PHREAD score 25 using PRINSEQ-lite70. Reads were then demultiplexed using NovoBarcode based on unique dual Golay barcode combinations71.

QIIME272 was used to identify amplicon sequence variants using DADA273 for both the 16SV4 and ITS1 amplicon data. For 16SV4 amplicon sequence variants (ASVs) the naïve Bayesian classifier74 was used to assign taxonomy using the lab’s custom database that is comprised of GreenGenes 13.875, HOMD76, and vaginal reference sequences77. For fungal taxonomic assignments, BLAST78 was used with the UNITE database79. Taxonomic assignment was combined with the ASV data using custom bash scripts into a biome file80 and further processed with R81.

### Statistical analysis

The phyloseq82 package was used to import microbiome data into R83 and to calculate the Chao1, Fisher and Shannon alpha diversity measures as well as the Jensen–Shannon diversity index for beta diversity analyses. The vegan84 package was used to run the PERMANOVA. The pROC package was used for the AUC analyses85. All data visualization was achieved using the ggplot2 package86.

The significance of belonging to taxon-specific hierarchical clusters in the heatmap analysis was assessed using Fisher’s exact test. Pairwise statistical significance in alpha diversity was determined using the Wilcoxon test. Significance in beta diversity was determined using PERMANOVA. ANCOM87 was used for bacterial taxa (i.e., biomarker) discovery. A linear model was used to determine significance of trends in the cytokine analyses and to extract ordinal ORs. Pearson coefficient was used for correlation analysis. The q-value88 package was used to correct the calculated linear trend p-values for multiple testing. Standard error of the mean was used to represent variation in sequencing depth between samples. Cox proportional hazard models were used in order to adjust the data for age, smoking, and HPV16 status in the survival analyses. The goodness of fit was assessed using the gof function from the survMisc package89 in all models and shown to have satisfactory performance at the 0.05 alpha threshold level. Age was treated as a continuous variable, HPV16 as binary (0 or 1) based on PCR results (V2), and smoking status as ordinal (0 = never smoker, 1 = former smoker, and 2 = current smoker) all were taken from V2.

### Calculating molBV

ANCOM87 was used to determine which bacterial genera were associated with BV through the use of microbial reference frames46 (Fig. 1E). Only those samples that were positive, or negative, for BV by both Amsel and Nugent criteria were used in the analysis (n = 18 BV-positive, n = 22 BV-negative). Out of the identified biomarkers only those that were present in ≥ 80% of samples with at least a 0.01% relative abundance after subsampling were retained for the calculation of molBV. These taxa included—Lactobacillus, Prevotella, Gardnerella, Megasphaera, Parvimonas, Clostridium, Porphyromonas, Adlercreutzia, Dialister, Atopobium, and Sneathia.

To create the microbial reference frames, log ratios were created using Lactobacillus and the markers elevated in the BV positive group (with Lactobacillus serving as the denominator in all ratios). The log ratios were then analyzed using a robust regression with the Nugent scores serving as the outcome and each ratio as the predictor. The beta coefficients and intercepts for each of the ratios were extracted and are presented in Supplementary Table 6.

To calculate molBV, which is the imputed continuous Nugent score, the log ratios between Lactobacillus and BV specific markers were generated (e.g., Lactobacillus:Prevotella, Lactobacillus:Gardnerella, Lactobacillus: Shuttleworthia, etc.) and used, along with the data from Supplementary Table 6, to calculate molBV:

$${molBV}=\frac{\mathop{\sum }\limits_{i}^{n}{{\beta }}_{{{{{\rm{0i}}}}}}+{{{\beta }_{1i}}X_{i}}}{n}$$

In the formula Xi represents the log ratio i, β0i is the ratio’s corresponding intercept and β1i is the beta coefficient. For an estimate to be valid for a given reference taxa, there had to be both Lactobacillus and a BV marker detected with a minimum of 1 read each. The final molBV score is the average of the valid log-ratio estimates that approximate the clinical Nugent score. Given the nature of regression prediction, molBV is not bound by the 0–10 range of the Nugent score and may take on non-integer values. To make the two scales more comparable, molBV was fit into the 0–10 range by using the following formula:

$${molBV}{\mbox{\_}}{scaled}=\frac{{molBV}-{{\min }}({molBV})}{{{\max }}\left({molBV}\right)-{{\min }}({molBV})}$$
(1)

In the formula molBV represents the raw score obtained from the above calculation, the max(molBV) is the highest calculated value in the cohort, the min(molBV) is the lowest calculated value in the cohort and molBV_scaled represents the final molBV score that falls into the desired 0–10 range.

Similar to Nugent scoring, ranges of the continuous molBV score, are used to define BV status. A molBV score of 0–3 is considered negative for BV, 4–6 is considered intermediate, and a score of 7–10 is considered consistent with BV.

### Confirmation cohorts

Three cohorts were used to confirm the molBV classifier. Full details about sample collection and processing can be obtained from the cited studies. The United States (USA) confirmation cohort was composed of 388 women with collection from three separate locations (two in Baltimore and one in Atlanta) with the women having a median age of 31 years40. The Cape Town cohort was composed of 90 women with a median age of 18 years42,44. The Soweto cohort was composed of 78 women with a median age of 18 years42,44.

### Cervical immune cytokines, chemokines, and soluble receptors

Cervical sponge samples were collected from women participating in the HPV Costa Rica Vaccine Trial (CVT) using a Merocel sponge (Medtronic Xomed, Jacksonville, FL) as previously described90. A customized panel including 32 cytokines, chemokines, and soluble receptors was quantitated using Luminex-based Milliplex Map Mulitplex Assays (Millipore, Billerica, MA) as previously described91.

### HPV natural history exposure/outcome definitions

DNA from cervical samples from the placebo arm of the Costa Rica vaccine trial45 were used to test the prospective association of molecular BV and cervicovaginal inflammation with HPV natural history stages. One analyzed outcome was time to clearance of high-risk HPVs (i.e., HPV16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58 and/or 5992). Women were selected from the CVT placebo arm where an incident high-risk HPV infection was detected (visit 1 sample this study). Clearance was recorded if a woman cleared all high-risk types detected at the incident visit within a 2-year observation window. The average time between collected-sample visits was 1.28 years. Data from the CVT trial45 was used to determine persistence status to ensure that the collected V2 sample fell within the observation window. The two core exposures were sustained high/low molBV and sustained high/low cervical inflammation as determined from IL-1β/IP-10 cytokine marker ratios. Inflammation and molBV sustained/persistent status were determined by median stratification; specifically the sustained status categories refer to a per-protocol approach where the exposure status had to be similar across the two analyzed visits. For example, the median molBV value across all study visits was 5.4 and if a women had a molBV value of 10 for visit 1 and 8 for visit 2, she would be placed in the sustained molBV high category. Participants with discordant molBV values were not included in this analysis in order to measure the per-protocol effect of sustained high/low molBV values in the context of HPV natural history (i.e., clearance or persistence). Excluded samples did not differ significantly from those retained in the analysis in terms of age, HPV16 positivity or smoking status (see Supplementary Tables 7.1 and 7.2). A second outcome analyzed was progression to CIN2+. CVT trial follow-up data was used to identify which of the study participants went on to develop CIN2+ after the V2 sample (average time to diagnosis was 2.68 years).

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

## Data availability

Sequence files and metadata for all samples used in this study have been uploaded to SRA (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA641099). Script used to calculate molBV with instructions and sample test data can be found in GitHub (https://github.com/musyk07/molBV). Source data are provided with this paper.

## References

1. Ravel, J., Moreno, I. & Simon, C. Bacterial vaginosis and its association with infertility, endometritis, and pelvic inflammatory disease. Am. J. Obstet. Gynecol. 224, 251–257 (2021).

2. Redelinghuys, M. J., Geldenhuys, J., Jung, H. & Kock, M. M. Bacterial vaginosis: current diagnostic avenues and future opportunities. Front. Cell. Infect. Microbiol. 10, 354 (2020).

3. Onderdonk, A. B., Delaney, M. L. & Fichorova, R. N. The human microbiome during bacterial vaginosis. Clin. Microbiol. Rev. 29, 223–238 (2016).

4. Koumans, E. H. et al. The prevalence of bacterial vaginosis in the United States, 2001–2004; associations with symptoms, sexual behaviors, and reproductive health. Sex. Transm. Dis. 34, 864–869 (2007).

5. Allsworth, J. E. & Peipert, J. F. Prevalence of bacterial vaginosis: 2001–2004 National Health and Nutrition Examination Survey data. Obstet. Gynecol. 109, 114–120 (2007).

6. Peebles, K., Velloza, J., Balkus, J. E., McClelland, R. S. & Barnabas, R. V. High global burden and costs of bacterial vaginosis: a Systematic Review and Meta-Analysis. Sex. Transm. Dis. 46, 304–311 (2019).

7. Bradshaw, C. S. & Sobel, J. D. Current treatment of bacterial vaginosis-limitations and need for innovation. J. Infect. Dis. 214, S14–S20 (2016).

8. Salah, R. M., Allam, A. M., Magdy, A. M. & Mohamed, A. Bacterial vaginosis and infertility: cause or association? Eur. J. Obstet. Gynecol. Reprod. Biol. 167, 59–63 (2013).

9. Brabant, G. [Bacterial vaginosis and spontaneous preterm birth]. J. Gynecol. Obstet. Biol. Reprod. 45, 1247–1260 (2016).

10. Tellapragada, C. et al. Risk factors for preterm birth and low birth weight among pregnant Indian women: a hospital-based prospective study. J. Prev. Med. Public Health 49, 165 (2016).

11. Bautista, C. T. et al. Bacterial vaginosis: a synthesis of the literature on etiology, prevalence, risk factors, and relationship with chlamydia and gonorrhea infections. Mil. Med. Res. 3, 4 (2016).

12. Rumyantseva, T., Khayrullina, G., Guschin, A. & Donders, G. Prevalence of Ureaplasma spp. and Mycoplasma hominis in healthy women and patients with flora alterations. Diagn. Microbiol. Infect. Dis. 93, 227–231 (2019).

13. Eastment, M. C. & McClelland, R. S. Vaginal microbiota and susceptibility to HIV. AIDS 32, 687–698 (2018).

14. McKinnon, L. R. et al. The evolving facets of bacterial vaginosis: implications for HIV transmission. AIDS Res. Hum. Retroviruses 35, 219–228 (2019).

15. King, C. C. et al. Bacterial vaginosis and the natural history of human papillomavirus. Infect. Dis. Obstet. Gynecol. 2011, 319460 (2011).

16. Usyk, M. et al. Cervicovaginal microbiome and natural history of HPV in a longitudinal study. PLoS Pathog. 16, e1008376 (2020).

17. Quan, L. et al. Simultaneous detection and comprehensive analysis of HPV and microbiome status of a cervical liquid-based cytology sample using Nanopore MinION sequencing. Sci. Rep. 9, 1–13 (2019).

18. Champer, M. et al. The role of the vaginal microbiome in gynaecological cancer. BJOG 125, 309–315 (2018).

19. Moscicki, A. B., Shi, B., Huang, H., Barnard, E. & Li, H. Cervical-vaginal microbiome and associated cytokine profiles in a prospective study of HPV 16 acquisition, persistence, and clearance. Front. Cell. Infect. Microbiol. 10, 569022 (2020).

20. Brusselaers, N., Shrestha, S., van de Wijgert, J. & Verstraelen, H. Vaginal dysbiosis and the risk of human papillomavirus and cervical cancer: systematic review and meta-analysis. Am. J. Obstet. Gynecol. 221, 9–18 e18 (2019).

21. Brookheart, R. T., Lewis, W. G., Peipert, J. F., Lewis, A. L. & Allsworth, J. E. Association between obesity and bacterial vaginosis as assessed by Nugent score. Am. J. Obstet. Gynecol. 220, 476 e471–476 e411 (2019).

22. Amsel, R. et al. Nonspecific vaginitis. Diagnostic criteria and microbial and epidemiologic associations. Am. J. Med. 74, 14–22 (1983).

23. Schwiertz, A., Taras, D., Rusch, K. & Rusch, V. Throwing the dice for the diagnosis of vaginal complaints? Ann. Clin. Microbiol. Antimicrob. 5, 4 (2006).

24. Nugent, R. P., Krohn, M. A. & Hillier, S. L. Reliability of diagnosing bacterial vaginosis is improved by a standardized method of gram stain interpretation. J. Clin. Microbiol. 29, 297–301 (1991).

25. Chaijareenont, K., Sirimai, K., Boriboonhirunsarn, D. & Kiriwat, O. Accuracy of Nugent’s score and each Amsel’s criteria in the diagnosis of bacterial vaginosis. J. Med. Assoc. Thail. 87, 1270–1274 (2004).

26. Hilbert, D. W. et al. Development and validation of a highly accurate quantitative real-time PCR assay for diagnosis of bacterial vaginosis. J. Clin. Microbiol. 54, 1017–1024 (2016).

27. Mohanty, S., Sood, S., Kapil, A. & Mittal, S. Interobserver variation in the interpretation of Nugent scoring method for diagnosis of bacterial vaginosis. Indian J. Med. Res. 131, 88–91 (2010).

28. Coleman, J. S. & Gaydos, C. A. Molecular diagnosis of bacterial vaginosis: an update. J. Clin. Microbiol. 56, e00342–00318 (2018).

29. Mitchell, C. & Marrazzo, J. Bacterial vaginosis and the cervicovaginal immune response. Am. J. Reprod. Immunol. 71, 555–563 (2014).

30. Alcaide, M. L. et al. A bio-behavioral intervention to decrease intravaginal practices and bacterial vaginosis among HIV infected Zambian women, a randomized pilot study. BMC Infect. Dis. 17, 1–10 (2017).

31. Turovskiy, Y., Sutyak Noll, K. & Chikindas, M. L. The aetiology of bacterial vaginosis. J. Appl. Microbiol. 110, 1105–1128 (2011).

32. Dabee, S., Passmore, J. -A. S., Heffron R. & Jaspan, H. B. The complex link between the female genital microbiota, genital infections and inflammation. Infect. Immun. 89, e00487-20 (2021).

33. Kenyon, C., Colebunders, R. & Crucitti, T. The global epidemiology of bacterial vaginosis: a systematic review. Am. J. Obstet. Gynecol. 209, 505–523 (2013).

34. Denney, J. M. & Culhane, J. F. Bacterial vaginosis: a problematic infection from both a perinatal and neonatal perspective. Semin. Fetal Neonatal Med. 14, 200–203 (2009).

35. Romero, R., Chaiworapongsa, T., Kuivaniemi, H. & Tromp, G. Bacterial vaginosis, the inflammatory response and the risk of preterm birth: a role for genetic epidemiology in the prevention of preterm birth. Am. J. Obstet. Gynecol. 190, 1509–1519 (2004).

36. Gillet, E. et al. Bacterial vaginosis is associated with uterine cervical human papillomavirus infection: a meta-analysis. BMC Infect. Dis. 11, 1–9 (2011).

37. Van Der Pol, W. J. et al. In silico and experimental evaluation of primer sets for species-level resolution of the vaginal microbiota using 16s ribosomal rna gene sequencing. J. Infect. Dis. 219, 305–314 (2018).

38. Usyk, M., Zolnik, C. P., Patel, H., Levi, M. H. & Burk, R. D. Novel ITS1 fungal primers for characterization of the mycobiome. mSphere 2, e00488–00417 (2017).

39. Fredricks, D. N., Fiedler, T. L. & Marrazzo, J. M. Molecular identification of bacteria associated with bacterial vaginosis. N. Engl. J. Med. 353, 1899–1911 (2005).

40. Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl Acad. Sci. USA 108(Suppl. 1), 4680–4687 (2011).

41. Kacerovsky, M. et al. Cervical Gardnerella vaginalis in women with preterm prelabor rupture of membranes. PLoS ONE 16, e0245937 (2021).

42. Lennard, K., et al. Microbial composition predicts genital tract inflammation and persistent bacterial vaginosis in South African adolescent females. Infect. Immun. 86, e00410-17 (2018).

43. Burton, J. P. & Reid, G. Evaluation of the bacterial vaginal flora of 20 postmenopausal women by direct (Nugent score) and molecular (polymerase chain reaction and denaturing gradient gel electrophoresis) techniques. J. Infect. Dis. 186, 1770–1780 (2002).

44. Dabee, S. et al. Defining characteristics of genital health in South African adolescent girls and young women at high risk for HIV infection. PLoS ONE 14, e0213975 (2019).

45. Herrero, R. et al. Rationale and design of a community-based double-blind randomized clinical trial of an HPV 16 and 18 vaccine in Guanacaste, Costa Rica. Vaccine 26, 4795–4808 (2008).

46. Morton, J. T. et al. Establishing microbial composition measurement standards with reference frames. Nat. Commun. 10, 2719 (2019).

47. Masson, L. et al. Inflammatory cytokine biomarkers to identify women with asymptomatic sexually transmitted infections and bacterial vaginosis who are at high risk of HIV infection. Sex. Transm. Infect. 92, 186–193 (2016).

48. Mitchell, C. M. et al. Vaginal microbiota and mucosal immune markers in women with vulvovaginal discomfort. Sex. Transm. Dis. 47, 269–274 (2020).

49. Srinivasan, S. et al. Bacterial communities in women with bacterial vaginosis: high resolution phylogenetic analyses reveal relationships of microbiota to clinical criteria. PLoS ONE 7, e37818 (2012).

50. Cartwright, C. P., Pherson, A. J., Harris, A. B., Clancey, M. S. & Nye, M. B. Multicenter study establishing the clinical validity of a nucleic-acid amplification-based assay for the diagnosis of bacterial vaginosis. Diagn. Microbiol. Infect. Dis. 92, 173–178 (2018).

51. France, M. T. et al. VALENCIA: a nearest centroid classification method for vaginal microbial communities based on composition. Microbiome 8, 166 (2020).

52. Coleman, J. S. & Gaydos, C. A. Molecular diagnosis of bacterial vaginosis: an update. J. Clin. Microbiol. 56, e00342-18 (2018).

53. Fettweis, J. M. et al. Differences in vaginal microbiome in African American women versus women of European ancestry. Microbiology 160, 2272–2282 (2014).

54. Łaniewski, P. et al. Features of the cervicovaginal microenvironment drive cancer biomarker signatures in patients across cervical carcinogenesis. Sci. Rep. 9, 1–14 (2019).

55. van de Wijgert, J. The vaginal microbiome and sexually transmitted infections are interlinked: consequences for treatment and prevention. PLoS Med. 14, e1002478 (2017).

56. Petrova, M. I., Reid, G., Vaneechoutte, M. & Lebeer, S. Lactobacillus iners: friend or Foe? Trends Microbiol. 25, 182–191 (2017).

57. Vaneechoutte, M. Lactobacillus iners, the unusual suspect. Res. Microbiol. 168, 826–836 (2017).

58. Schiffman, M. et al. Carcinogenic human papillomavirus infection. Nat. Rev. Dis. Prim. 2, 16086 (2016).

59. Menard, J. P., Fenollar, F., Henry, M., Bretelle, F. & Raoult, D. Molecular quantification of Gardnerella vaginalis and Atopobium vaginae loads to predict bacterial vaginosis. Clin. Infect. Dis. 47, 33–43 (2008).

60. Schiffman, M. & Wentzensen, N. From human papillomavirus to cervical cancer. Obstet. Gynecol. 116, 177–185 (2010).

61. Mitra, A. et al. Cervical intraepithelial neoplasia disease progression is associated with increased vaginal microbiome diversity. Sci. Rep. 5, 1–11 (2015).

62. Piyathilake, C. J. et al. Cervical microbiota associated with higher grade cervical intraepithelial neoplasia in women infected with high-risk human papillomaviruses. Cancer Prev. Res. 9, 357–366 (2016).

63. Laniewski, P. et al. Features of the cervicovaginal microenvironment drive cancer biomarker signatures in patients across cervical carcinogenesis. Sci. Rep. 9, 7333 (2019).

64. Schlecht, N. F. et al. Cervical, anal and oral HPV in an adolescent inner-city health clinic providing free vaccinations. PLoS ONE 7, e37419 (2012).

65. Schlecht, N. F. et al. Risk of delayed human papillomavirus vaccination in inner-city adolescent women. J. Infect. Dis. 214, 1952–1960 (2016).

66. Gutman, R. E., Peipert, J. F., Weitzen, S. & Blume, J. Evaluation of clinical methods for diagnosing bacterial vaginosis. Obstet. Gynecol. 105, 551–556 (2005).

67. Caporaso, J. G. et al. Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. 6, 1621 (2012).

68. Wang, Y. & Qian, P. Y. Conservative fragments in bacterial 16S rRNA genes and primer design for 16S ribosomal DNA amplicons in metagenomic studies. PLoS ONE 4, e7401 (2009).

69. Rosenbaum, J. et al. Evaluation of oral cavity DNA extraction methods on bacterial and fungal microbiota. Sci. Rep. 9, 1531 (2019).

70. Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinformatics 27, 863–864 (2011).

71. Hercus, C. Novocraft Short Read Alignment Package. Website http://www.novocraft.com (2009).

72. Bolyen, E. et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857 (2019).

73. Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

74. Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl. Environ. Microbiol. 73, 5261–5267 (2007).

75. DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).

76. Chen, T. et al. The Human Oral Microbiome Database: a web accessible resource for investigating oral microbe taxonomic and genomic information. Database 2010, baq013 (2010).

77. Group, N. H. W. et al. The NIH human microbiome project. Genome Res. 19, 2317–2323 (2009).

78. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

79. Koljalg, U. et al. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. N. Phytol. 166, 1063–1068 (2005).

80. McDonald, D. et al. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. GigaScience 1, 7 (2012).

81. Team, R. C. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).

82. McMurdie, P. J. & Holmes, S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE 8, e61217 (2013).

83. Team R. C. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013).

84. Oksanen J., et al. Vegan: Community Ecology Package. R Package Version 1.17-4. http://www.cranr-project.org Acesso em 23, 2010 (2010).

85. Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinforma. 12, 77 (2011).

86. Wickham H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2016).

87. Mandal, S. et al. Analysis of composition of microbiomes: a novel method for studying microbial composition. Micro. Ecol. Health Dis. 26, 27663 (2015).

88. Dabney, A., Storey, J. D. & Warnes, G. qvalue: Q-value estimation for false discovery rate control. R package version 1. (2010).

89. May, S. & Hosmer, D. W. A cautionary note on the use of the Grønnesby and Borgan goodness-of-fit test for the Cox proportional hazards model. Lifetime Data Anal. 10, 283–291 (2004).

90. Kemp, T. J. et al. Evaluation of two types of sponges used to collect cervical secretions and assessment of antibody extraction protocols for recovery of neutralizing anti-human papillomavirus type 16 antibodies. Clin. Vaccin. Immunol. 15, 60–64 (2008).

91. Koshiol, J. et al. Evaluation of a multiplex panel of immune-related markers in cervical secretions: a methodologic study. Int. J. Cancer 134, 411–425 (2014).

92. Amaro-Filho, S. M. et al. HPV73 a nonvaccine type causes cervical cancer. Int. J. Cancer 146, 731–738 (2020).

## Acknowledgements

We would like to thank Jo-Ann Passmore, Smritee Dabee, Katie Viljoen, Heather Jaspan, Christina Balle, and other members of the WISH cohort for providing the clinical Nugent scores for the Cape Town and Soweto confirmation sets. This work was supported in part by the National Institutes of Health, National Cancer Institute (CA78527 to R.D.B.), the National Institute of Allergy and Infectious Disease (AI072204 to MPIs A.D., R.D.B., N.S.) and the Albert Einstein Cancer Research Center (P30CA013330, PI Ed Chu).

CVT cohort declaration

Investigators in the International Agency for Research on Cancer/World Health Organization: Where authors are identified as personnel of the International Agency for Research on Cancer/World Health Organization, the authors alone are responsible for the views expressed in this article and they do not necessarily represent the decisions, policy or views of the International Agency for Research on Cancer/World Health Organization.

We extend a special thanks to the women of Guanacaste and Puntarenas, Costa Rica, who gave of themselves in participating in this effort. In Costa Rica, we acknowledge the tremendous effort and dedication of the staff involved in this project; we would like to specifically acknowledge the meaningful contributions by Carlos Avila, Loretto Carvajal, Rebeca Ocampo, Cristian Montero, Diego Guillen, Jorge Morales and Mario Alfaro. In the United States, we extend our appreciation to the team from Information Management Services (IMS) responsible for the development and maintenance of the data system used in the trial and who serve as the data management center for this effort, especially Jean Cyr, Julie Buckland, John Schussler, and Brian Befano. We thank Dr. Diane Solomon (CVT: medical monitor & QC pathologist) for her invaluable contributions during the randomized blinded phase of the trial and the design of the LTFU and Nora Macklin (CVT) and Kate Torres (LTFU) for the expertise in coordinating the study. We thank the members of the Data and Safety Monitoring Board charged with protecting the safety and interest of participants during the randomized, blinded phase of our study (Steve Self, Chair, Adriana Benavides, Luis Diego Calzada, Ruth Karron, Ritu Nayar, and Nancy Roach) and members of the external Scientific HPV Working Group who have contributed to the success of our efforts over the years (Joanna Cain and Elizabeth Fontham, Co-Chairs, Diane Davey, Anne Gershon, Elizabeth Holly, Silvia Lara, Henriette Raventós, Wasima Rida, Richard Roden, Maria del Rocío Sáenz Madrigal, Gypsyamber D’Souza, and Margaret Stanley). The Costa Rica HPV Vaccine Trial is a long-standing collaboration between investigators in Costa Rica and the NCI. The trial is sponsored and funded by the NCI (contract N01-CP-11005), with funding support from the National Institutes of Health Office of Research on Women’s Health. GlaxoSmithKline Biologicals (GSK) provided vaccine and support for aspects of the trial associated with regulatory submission needs of the company under a Clinical Trials Agreement (FDA BB-IND 7920) during the four-year, randomized blinded phase of our study. John T. Schiller and Douglas R. Lowy report that they are named inventors on US Government-owned HPV vaccine patents that are licensed to GlaxoSmithKline and Merck and for which the National Cancer Institute receives licensing fees. They are entitled to limited royalties as specified by federal law. The other authors declare that they have no competing interests. The NCI and Costa Rica investigators are responsible for the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparaThe Costa Rica HPV Vaccine Trial is tion of the manuscript. Registered with Clinicaltrials.gov NCT00128661.

## Author information

Authors

### Contributions

M.U., N.F.S., and R.D.B. conceptualized and designed the study. N.S.F., H.D.S., R.D.B., A.N.S., A.D., and S.P. were involved with patient recruitment and clinical study design. R.H., C.P., L.P., and the CVT Consortium were involved in the CVT clinical sub-study design. C.C.S. and A.G. processed and prepared samples for sequencing. L.W. performed a literature review. M.S. was involved in cytokine analysis and review. M.U., S.V., and N.F.S. performed statistical analyses. M.U. and R.B. wrote the manuscript with the help of all co-authors. All authors read and edited the final manuscript.

### Corresponding author

Correspondence to Robert D. Burk.

## Ethics declarations

### Competing interests

The authors declare no competing interests.

### Ethics approval and Consent to participate

The study was approved by the Institutional Review Board at the Icahn School of Medicine at Mount Sinai, Manhattan, New York, and written informed consent was obtained from all study participants and from guardians accompanying minors before enrollment.

## Peer review information

Nature Communications thanks Dohun Pyeon, Hans Verstraelen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions

Usyk, M., Schlecht, N.F., Pickering, S. et al. molBV reveals immune landscape of bacterial vaginosis and predicts human papillomavirus infection natural history. Nat Commun 13, 233 (2022). https://doi.org/10.1038/s41467-021-27628-3

• Accepted:

• Published:

• DOI: https://doi.org/10.1038/s41467-021-27628-3