Introduction

A healthy cervicovaginal microbiome consists predominantly of lactobacilli, which are thought to restrict growth of pathogenic bacteria by maintaining a low vaginal pH (Hickey et al., 2012, Petrova et al., 2013). Clinical conditions associated with an altered cervicovaginal microbiome include bacterial vaginosis (BV) and vaginal candidiasis. BV is typically diagnosed using Amsel criteria (Amsel et al., 1983) or Gram stain Nugent scoring (Nugent et al., 1991) and vaginal candidiasis by microscopy and/or culture of Candida species.

The human cervicovaginal microbiome has an important role in protecting women and neonates from diseases, such as pelvic inflammatory disease (Taylor et al., 2013), preterm birth (Li et al., 2012) and sepsis (Verani et al., 2010). Current epidemiological evidence using the above-mentioned diagnostic methods also suggests that any deviation from a lactobacilli-dominated cervicovaginal microbiome increases women’s susceptibility to HIV and, in HIV-positive women, genital HIV shedding (Sha et al., 2005; van de Wijgert et al., 2008; Hayes et al., 2010). Furthermore, relationships between BV by Nugent scoring and herpes simplex virus type 2 (HSV-2; Cherpes et al., 2003), Neisseria gonorrhoeae, Chlamydia trachomatis (Wiesenfeld et al., 2003) and Trichomonas vaginalis (Rathod et al., 2011) have been described.

In the last decade, phylogenetic analyses of cervicovaginal samples (mostly bacterial 16S ribosomal RNA gene sequencing) have shown that bacterial communities in the vagina are more complex than previously thought (Hummelen et al., 2010; Ling et al., 2010; Ravel et al., 2011; Schellenberg et al., 2011; Srinivasan et al., 2012). American studies identified five microbiome clusters (dominated by Lactobacillus iners, L. crispatus, L. gasseri, L. jensenii and a mixture of anaerobic bacteria, respectively) in asymptomatic women (Ravel et al., 2011; Gajer et al., 2012) and multiple clusters consisting of mixtures of anaerobic bacteria, not dominated by a single taxon, in women with BV (Srinivasan et al., 2012). These studies have led to the view that BV is a condition characterized by polymicrobial dysbiosis (Hickey et al., 2012).

Molecular methods to study the cervicovaginal microbiome have not yet been used in large epidemiological studies with clinical outcomes due to limited availability and high costs. A few exploratory studies have described differences between HIV-positive and HIV-negative women, and most found trends toward decreased lactobacilli and increased bacterial diversity (Spear et al., 2008; Mitchell et al., 2009; Dols et al., 2011; Pépin et al., 2011; Schellenberg et al., 2011; Dols et al., 2012; Mitchell et al., 2013).

The long-term consequences of cervicovaginal dysbiosis are the most prevalent in resource-poor countries, African countries in particular, but comprehensive molecular microbiome data from these countries are scarce. We, therefore, sought to describe cervicovaginal microbiome compositions of women at high risk of HIV in Kigali, Rwanda, using a phylogenetic microarray, and to correlate these compositions with the presence of HIV, other sexually transmitted infections (STIs), clinical signs and symptoms, and demographic and behavioral characteristics.

Materials and methods

Study design

The Kigali HIV Incidence Study (KHIS) was a prospective cohort study conducted in 2007–2008 at Rinda Ubuzima, a non-governmental research clinic in Kigali, Rwanda. The study was approved by the National Ethics Committee, Rwanda, and the Columbia University Medical Center Review Board, USA. All participants provided written informed consent. The study design and procedures were described elsewhere (Braunstein et al., 2011; Supplementary Information). Briefly, 800 female sex workers were screened for HIV and 397 HIV-negative women were retested every three months for 1 year to determine the HIV incidence. All cohort participants, plus 141 women who tested HIV-positive at screening, were seen again once during the second year. Socio-demographic, behavioral and clinical data were collected by face-to-face interviewing at all visits. Pelvic examinations, including cervical sampling for microarray analyses, were done at the month-6 (M6) and year-2 (Yr2) visits. Diagnostic tests for STIs, BV (Amsel and Nugent), candidiasis (microscopy), pregnancy and cervical cytology were conducted at regular intervals throughout the study (Supplementary Information). All participants received counseling and non-spermicidal condoms free of charge. Women who tested positive for curable STIs were treated by study clinicians. Women who tested HIV-positive received a CD4 count. Women who tested positive for HIV or pregnancy, or had abnormal cervical cytology, were referred to public clinics providing appropriate care.

Cervical samples were taken from all women who attended the M6 and Yr2 visits, and 202 of these 719 samples were selected for microarray testing. To increase our statistical power for comparing microbiota of women with and without genital pathogens, we oversampled women with HIV and other STIs, but we also included a random sample of women without genital pathogens. All women contributed one sample, except for 28 women who contributed two samples each. In total, 6 out of 202 samples did not show appropriate positive control signals, resulting in the availability of 196 samples of 174 women for analysis. All 196 samples were used for the clustering and ecological analyses, but only one sample per participant (for a total of 174 samples) was used for all other analyses.

Microarray design

The microarray contained 461 DNA hybridization probes targeting microorganisms and 164 positive (16S conserved regions) and negative controls (Supplementary Information) (Dols et al., 2011, 2012). Of the 251 probes that generated a consistent signal with a signal/background (S/B) ratio >5, 66 16S probes were species-specific, 56 16S probes targeted multiple bacterial species within one genus, 36 16S probes were specific at family or order level, 69 targeted higher taxonomic levels, 5 were groEL probes, 16 were 18S probes and 3 were viral probes. We focused our clustering analyses on these 251 probes, and all additional analyses on the 122 16S probes generating species or genus-specific signals. A probe targeting a bacterium classified by the Ribosomal Database Project as an uncultured bacterium in the Lachnospiraceae family matched perfectly with a bacterium recently named BV-associated bacterium 1 (BVAB1) in Genbank (Genbank entry AY724739.1) (Fredricks et al., 2005). We refer to it as BVAB1 and included it in the 122 species/genus-specific probes. We made a few other probe assumptions on the basis of existing knowledge about bacterial presence in the vagina (Supplementary Information).

We used normalized S/B ratios to estimate bacterial loads, referred to from here onward as ‘semi-quantitative abundance’ or ‘abundance’. For probes targeting different species within one genus, S/B ratios were summed to calculate genera-based microbial ecology parameters (see further). For probes targeting the same genus, we chose one probe that targeted the most species within that genus. For total abundance per sample, we summed genus-specific abundance of all genera present in that sample.

Microarray testing and data processing

DNA was extracted using the AGOWA mag Mini DNA isolation kit (AGOWA GmbH, Berlin, Germany) and bead beating in a BeadBeater (BioSpec Products Inc., Bartlesville, OK, USA; Supplementary Information). Microarray sample preparation, labeling, amplification and hybridization were described previously (Dols et al., 2011; Supplementary Information).

Imagene 5.6 software (BioDiscovery, Marina del Rey, CA, USA) was used to quantify the signal (S) by calculating the mean of all pixel values in the spot, as well as the background (B) surrounding the spot. If S was not confidently above B (S>B+2*s.d. of B) (Quackenbush, 2002), the S/B ratio was set at 1. Samples for which the positive controls showed a low S/B ratio (mean S/B ratio >2*s.d. lower than the mean level of positive controls of all samples) were excluded from the analysis. Lowess smoothing was performed for slide normalization (Quackenbush, 2002).

HIV-1 RNA in cervicovaginal lavages

The pelvic examination at the Y2 visit included a cervicovaginal lavage (CVL): the left and right fornix and cervical os were irrigated twice with 5 ml normal saline, which was aspirated after 30 s; a median volume of 5.5 ml (range 3.8–7.5 ml) was recovered. The CVL fluid was immediately placed on ice or at 2–8 °C, and centrifuged at 1000 rpm for 10 min within 4 h of collection. Cell pellets and aliquots of supernatant were stored at −80 °C until testing. Of the 64 HIV-positive women with a microarray result, 61 had a CVL supernatant sample for HIV-1 RNA viral load testing available, and 58 had valid test results. In addition, 50 HIV-positive women without a microarray result had a valid HIV-1 RNA test result. In total, 500 μl CVL fluid was mixed with 500 μl PBS and subjected to nucleic acid isolation, amplification and detection using COBAS AmpliPrep/COBAS Taqman v2.0 according to the manufacturer’s instructions (Roche Molecular Systems, Branchburg, NJ, USA).

Statistical analysis

Statistical analyses were performed using Python 2.7 (Python Software Foundation, Beaverton, OR, USA), MATLAB (R2012a, The MathWorks, Natick, MA, USA), STATA release 12 (StataCorp, College Station, TX, USA) and MS Excel (Microsoft, Redmond, WA, USA).

We used neighborhood co-regularized multi-view spectral clustering of normalized log2-transformed S/B ratios to identify cervicovaginal microbiome clusters (Ng et al., 2001; Kumar et al., 2011; Tsivtsivadze et al., 2013). This modified spectral method generates and combines multiple clustering possibilities and leads to a result that is more robust compared with that of standard clustering techniques. Furthermore, the method captures complex neighborhood-based interactions in the dataset. We used Gaussian similarity measures to calculate a co-occurrence matrix and to identify the number of clusters in the dataset (Strehl and Ghosh, 2003). For additional analyses using the microbiome clusters as the outcome or main predictor, we only included women who had more than 70% probability of belonging to a cluster as determined by probabilistic decomposition of the co-occurrence matrix (147 of 174 women).

We used previously described methods to determine microbial ecology parameters per sample: richness, defined as the median number of genera per sample (Marzorati et al., 2008); the Shannon diversity index (Shannon, 1948); and evenness, expressed as a Co-value with 0 representing complete evenness and 100 as complete unevenness (Marzorati et al., 2013). We focused the evenness calculations on the five most abundant bacteria in each cluster to reduce the influence of the long tail of minority species (Marzorati et al., 2013). To compare cumulative Co-values per cluster, an average sample per cluster was generated by calculating median S/B ratios per genus across the samples in that cluster.

To assess positive and negative correlations between bacteria, we first determined Spearman correlation coefficients (with Bonferroni correction) between genera, followed by those between species within the genera that were statistically significantly correlated plus species that are not yet classified at genus level, such as BVAB1. To correlate microarray findings with Nugent scores, the S/B ratios of 12 Lactobacillus probes (covering 70 species) were summed, the S/B ratios of one G. vaginalis probe and one Bacteroides fragilis probe were summed and one M. mulieris probe was used.

Characteristics of the total study population, women included in the microarray experiments and women in each cervicovaginal microbiome cluster were described using the median and interquartile range (IQR) for continuous data and counts and percentages for categorical data. Women in clusters R-III, R-IV and R-VI (each characterized by a high diversity of anaerobic bacteria) were pooled in some analyses to increase statistical power. Differences in correlates between clusters, and between samples assigned and those not assigned to clusters, were assessed by two-sided Kruskal–Wallis test for continuous data, Fishers’ exact test for categorical data, and the ‘nptrend’ function in STATA, an extension of the Wilcoxon rank-sum test, for trends (Cuzick, 1985).

Determinants of detectable HIV-1 RNA in CVL samples, including microarray clusters, were assessed by logistic regression. In these analyses, women in clusters R-I and R-II were pooled because only one woman in cluster R-I was HIV-positive. All models were adjusted for plasma CD4 count because advanced HIV infection (characterized by low-CD4 count) is associated with high plasma and genital viral load.

Results

Study sample

The prevalence of HIV in the KHIS study at screening was 24%, and 19 participants seroconverted during follow-up (Braunstein et al., 2011). Other viral and bacterial STIs, BV and candidiasis were common (Table 1). By design, the 174 women selected for microarray analyses had a higher prevalence of HIV, STIs and pregnancy than the original sample of 800 women. The prevalence of BV by Amsel criteria or Nugent scoring was not different between the microarray sample and the original sample.

Table 1 Participant characteristics: total study population versus microarray sample

Most of the selected women (median age 27, range 18–47) had never been married (75%) and did not complete primary school (66%) (Table 1). Although almost all women (99%) self-identified as sex worker and had worked as a sex worker for a median of 3 years (IQR=2–5), 14% left sex work during the follow-up (Ingabire et al., 2012). Almost all women (92%) reported vaginal finger-cleansing using water or water and soap, and 37% reported consistent condom use. Twenty-two women (13%) were pregnant at the time of sampling.

Microarray probe hybridization

The 122 species/genus-specific 16S probes represented 314 species from 51 genera, 32 families and 17 orders. On average, samples contained species from 11 genera (range 0–25). Most samples contained lactobacilli (98% of samples), including L. iners (74%), L. crispatus (16%), L. jensenii/L. salivarius/other (6%), L. gasseri/L. johnsonii/other (6%) and L. vaginalis/other (21%) (see Supplementary Information for other Lactobacillus spp. targeted by these probes). Leptotrichia (94%), Prevotella (91%), Corynebacterium (90%) and Gardnerella species (82%) were also present in most samples. However, lactobacilli other than L. iners and L. crispatus, Leptotrichia and Corynebacterium species were present in low abundance, and did therefore not play a significant role in the clustering (see further). The overall presence of common BV-associated anaerobes was Atopobium (65% of samples), Dialister (61%), BVAB1 (50%), Mobiluncus (48%), Sneathia (47%) and Megasphaera (44%), but their presence was low in the lactobacilli-dominated clusters and approached 100% in BV-associated clusters (see further). No or low signals were obtained for the Bifidobacteriaceae family other than Gardnerella, Bacteroides, Escherichia coli, Enterococcus, Streptococcus and Staphylococcus species.

Cervicovaginal microbiome clusters

We identified six cervicovaginal microbiome clusters by neighborhood co-regularized multi-view spectral clustering, using data from 196 samples and 251 probes as described above (Figure 1a). The co-occurrence matrix (Figure 1b), which shows how often samples co-occur in multiple spectral clustering analysis runs, visualizes these six clusters. We refer to these clusters as R-I to R-VI (with the R denoting ‘Rwanda’). The probabilistic decomposition of this matrix (Figure 1c), which returns the probability of each sample belonging to a specific cluster, confirms the presence of these six clusters. A total of 36 samples from 27 women had a probability of <70% of belonging to one of these six clusters. Samples that were not assigned (shown in white on the top bar of Figure 1a) fell between any two clusters and had little in common. The 27 women in the unassigned group had fewer lifetime pregnancies, and reported consistent condom use more often, but were otherwise similar to the 147 women in the other groups (Supplementary Table S1).

Figure 1
figure 1

Co-occurrence matrix and microbiome clustering. (a) shows the six clusters obtained by Neighborhood Co-regularized Multi-view Spectral Clustering of microarray data. The white spaces between the clusters represent samples with less than 70% probability of belonging to a cluster (see also c). The co-occurrence score reflects how many times samples co-occurred in different clustering configurations. The matrix (b) shows all 196 samples on both the x- and y-axis in the same order. Only samples with at least 70% probability of belonging to a cluster are assigned to a cluster (c).

We used microbial ecological parameters (Figure 2) and a heatmap (Figure 3) to characterize each cluster. Cluster R-1 was dominated by L. crispatus and cluster R-II by L. iners. Clusters R-III to R-VI were not dominated by one taxa, but contained several (facultative) anaerobes, with or without L. iners, in different compositions (see further). This was reflected in the evenness of the five most abundant genera, which was lower in clusters R-I and R-II compared with the other clusters (cumulative Co-values of 47 and 63, compared with 12, 9, 23 and 22, respectively; Figure 2a). When comparing clusters R-I and R-II to clusters R-III, R-IV, R-V and R-VI, the total semi-quantitative abundances and richness were lower (a median of 5 and 6 genera per sample compared with 18, 18, 14 and 17 genera, respectively; Figure 2b).

Figure 2
figure 2

Distribution of the most abundant genera per cluster. (a) shows the cumulative Co-values for the most abundant genera in each cluster in descending order. Note that BVAB1 is not included because it is not yet described at genus level. Although the final cumulative Co-value of each cluster is around 60, clusters R-I and R-II have the most uneven distribution of their most abundant genera. (b) shows these most abundant genera per cluster, and highlights the dominance of lactobacilli in clusters R-I and R-II compared with a more even distribution of anaerobic genera in the other clusters. Also, large differences in richness and total S/B ratio (abundance) are apparent, with clusters R-I and R-II containing fewer genera and lower abundance than the other clusters.

Figure 3
figure 3

Bacterial composition of the microbiome clusters. (a) shows the six clusters obtained by Neighborhood Co-regularized Multi-view Spectral Clustering of microarray data. The white spaces between the clusters represent samples with less than 70% probability of belonging to a cluster. (b) Heatmap, showing normalized S/B ratios on a log2 logarithmic scale of the most abundant species/genera per cluster, as well as species/genera that have traditionally been associated with BV and have been frequently reported in the literature. (c) BV-status of the women by Nugent score at the time of sampling. The color key is on the right. (d) The Shannon diversity index for each sample. 1Abbreviated probe name; additional targeted species in the same genus are listed in Supplementary Information.

Clusters R-III to R-VI contained high abundances of Gardnerella, Prevotella and Atopobium species and lower abundances of Dialister, Megasphaera and Mobiluncus species and BVAB1 (Figures 2 and 3). BVAB1 was mostly present in clusters R-III and R-IV. Cluster R-III’s unique feature was the presence of a lower abundance of L. iners than the other mixed anaerobic clusters. Clusters R-IV and R-VI included the above-mentioned anaerobes and L. iners, but R-IV was the only cluster containing high abundance of an uncultured bacterium in the Gardnerella genus (see Supplementary Information) and R-VI contained the highest levels of Prevotella species, including P. bivia. Cluster R-V had a lower total bacterial abundance than the other mixed anaerobic clusters.

Correlations between species and with diagnostic test results

Most anaerobic bacteria were positively correlated with each other but negatively correlated with Lactobacillus species (Supplementary Figure S1; Supplementary Information).

The microarray clusters correlated poorly with the Amsel criteria (Table 2). Only 14–33% of women in the mixed anaerobic microbiome clusters had >20% clue cells present on wet mount, and 11–14% had a positive whiff test. Furthermore, while 97–100% of women in the mixed anaerobic clusters had a vaginal pH >4.5, 55–60% of women in lactobacilli-dominated clusters also did. In contrast, the clusters correlated well with Nugent scores (Table 2). Clusters R-I and R-II were statistically significantly associated with a normal Nugent score of 0–3 and clusters R-III to R-VI with a BV Nugent score of 7–10. Women with intermediate microbiota (Nugent score 4–6) did not cluster together but were mostly included in clusters R-II and R-V. The probes representing the morphotypes included in Nugent scoring (G. vaginalis and Bacteroides fragilis combined, M. mulieris and 12 lactobacilli probes covering 70 species) also correlated well with the Nugent scores (all P<0.001; Supplementary Figure S2).

Table 2 Amsel criteria and Nugent scores by cervicovaginal microbiome cluster

Socio-demographic, behavioral and HIV/STI correlates of cervicovaginal microbiome clusters

Women belonging to each of the six clusters had similar socio-demographic and behavioral characteristics, except that women in the lactobacilli clusters were more likely to ever having been married than women in the mixed anaerobic clusters (Table 3 and Supplementary Table S1). Women in the pooled cluster R-III/R-IV/R-VI were significantly more likely to report genitourinary symptoms on the day of sampling than women in the other clusters. No significant differences in pelvic exam findings were found.

Table 3 Cervicovaginal microbiome cluster correlates

Women in the L. crispatus-dominated cluster R-I were statistically significantly less likely to have HIV, HSV-2, any HPV and high-risk HPV than women in the other clusters, and had no bacterial STIs (Table 3; Figure 4). Only one woman in cluster R-I was HIV-positive (9%) compared with 33–56% of women in the other clusters (P=0.03). The percentages were 36% versus 78–88% for HSV-2 (P<0.01), 9% versus 42–60% for any HPV (P=0.02), 0% versus 38–54% for high-risk HPV (P<0.01) and 0% versus 38–61% for bacterial STIs (T. pallidum, N. gonorrhoeae, C. trachomatis and T. vaginalis combined; P=0.15). Statistically significant trends in prevalence of HIV and other STIs were found from low prevalence in cluster R-I to higher prevalence in clusters R-II and R-V, and highest prevalence in the pooled cluster R-III/R-IV/R-VI (Table 3; Figure 4).

Figure 4
figure 4

Associations between microbiome clusters and HIV/STI prevalence. Women in the Lactobacillus crispatus-dominated cluster R-I had a lower prevalence of viral STIs than women in the other clusters (P<0.01) and had no bacterial STIs (P=0.15). A trend of increasing prevalence of viral STIs in clusters with increasing bacterial diversity was found, with the lowest prevalence in cluster R-I and increasing prevalence in clusters R-II, R-V and R-III/R-IV/R-VI, respectively (Ptrend<0.01). *No bacterial STIs were found in women assigned to cluster R-I. 1Only tested at M6 in the HIV-negative cohort; data available for 61 women.

Microbiome clusters and HIV-1 RNA genital tract shedding

The prevalence of HIV-1 RNA in CVLs of HIV-positive women also increased with increasing microbiome diversity: 10% (two women) in the pooled cluster R-I/R-II, 40% (four women) in cluster R-V and 42% (eight women) in the pooled cluster R-III/R-IV/R-VI (Ptrend=0.03). The adjusted odds of having detectable HIV-1 RNA (adjusted for CD4 count) was 8.78 (95% confidence interval (CI)=1.12–69.09) times higher for women in cluster R-V and 5.29 (95% CI=0.91–30.67) times higher for women in the pooled cluster R-III/R-IV/R-VI compared with women in the pooled cluster R-I/R-II. After adjusting for CD4 count, positive correlations were found between HIV-1 RNA concentrations and abundance of several BV-associated probes and a negative association with a Lactobacillus genus probe (Supplementary Information).

In bi-variable logistic regression models using data from 108 HIV-positive women with an available HIV-1 RNA measurement, other factors associated with detectable genital HIV-1 RNA levels were genital itching (odds ratio (OR)=17.55, 95% CI=2.10–146.46), other specific genital symptoms (100 versus 31%, P<0.001), abundant cervical mucus on pelvic exam (OR=8.00, 95% CI=0.86–74.37), a BV Nugent score of 7–10 (OR=4.67, 95% CI=1.44–15.16), not using antiretroviral therapy (ART, data available for 101 women, OR=8.71, 95% CI=2.76–27.5) and HPV infection (OR=3.62, 95% CI=1.25–10.48), but not HSV-2 antibodies. Similar proportions of women in each cluster used ART (42%, 43% and 31% of women in R-I/R-II, R-V and R-III/R-IV/R-VI, respectively; Ptrend 0.5) and clustering was not associated with ART.

Discussion

We identified six cervicovaginal microbiome clusters in Rwandan sex workers. To date, 10 studies have described clusters on the basis of next generation sequencing data (Forney et al., 2010; Hummelen et al., 2010; Ravel et al., 2011; Schellenberg et al., 2011; Frank et al., 2012; Martin et al., 2012; Smith et al., 2012; Srinivasan et al., 2012; Drell et al., 2013; Lee et al., 2013). Even though these studies included different study populations and employed a variety of molecular and clustering procedures, consistent clustering patterns can be discerned. The majority of studies, including ours, found one cluster dominated by L. crispatus and one by L. iners. Clusters dominated by L. jensenii or L. gasseri were reported less frequently, and we did not find them in our study. All studies identified at least one cluster that was not dominated by a single taxon, but contained mixtures of anaerobes with or without Lactobacillus species. Clusters dominated by facultative anaerobic organisms, including streptococci, staphylococci and E. coli/Shigella species, were rarely reported, but the above-mentioned next generation sequencing studies suggest that these taxa are present in up to 30% of women in low abundance. This is also in agreement with our findings. Based on the above, we consider BV (represented by our clusters III to VI) as a state of polymicrobial dysbiosis. We speculate that whether the dysbiosis is symptomatic or not depends on the degree and nature of the dysbiosis, total bacterial loads, and the intensity and nature of the host’s immune responses. Although most BV-associated bacteria are not pathogenic in immune-competent hosts, we also speculate that some (e.g., streptococci and E. coli) might lead to invasive disease when present at sufficiently high levels. However, none of this has been definitively shown in clinical studies to date.

Our study showed that women in the L. crispatus cluster had the lowest prevalence of HIV/STIs, with a slight increase in the L. iners cluster and a significant increase in the dysbiotic clusters. A similar trend was found for HIV-1 RNA shedding in the genital tract of HIV-positive women. These findings are in agreement with the majority of studies that have also investigated these associations (Spear et al., 2008; Mitchell et al., 2009; Dols et al., 2011; Pépin et al., 2011; Schellenberg et al., 2011; Dols et al., 2012; Mitchell et al., 2013), but we are the first to report these associations for multiple STI pathogens and to demonstrate a dose-response relationship. It is important to note that the L. crispatus cluster only contained 11 women, and that the temporality of our findings is unclear because our study was cross-sectional. Our results, if confirmed in prospective studies, might imply that not only symptomatic dysbiosis should be treated, but also asymptomatic dysbiosis (defined here as a microbiome not dominated by lactobacilli) in certain risk groups. These risk groups might include women at high risk for HIV/STIs or adverse pregnancy outcomes and women with recurrent dysbiosis.

Laboratory studies also suggest that L. crispatus might protect against pathogens. L. crispatus is an efficient lactic acid producer (Hickey et al., 2012), produces antimicrobial compounds (Graver and Wade, 2011; Hickey et al., 2012; Aldunate et al., 2013; Petrova et al., 2013) and inhibits inflammation (Rose et al., 2012; Petrova et al., 2013). The latter is particularly important in the context of HIV transmission, as HIV infects CD4+-immune cells that are recruited to the genital mucosa when inflammation is present. In contrast, BV-associated bacteria could increase HIV-infection risk and HIV replication in the genital mucosa of HIV-infected women, by provoking local immune activation and/or disruption of the vaginal epithelium (Sha et al., 2005; Marconi et al., 2013; Mitchell et al., 2013; Petrova et al., 2013). In vitro studies have indeed shown that some BV-associated bacteria can enhance HIV expression, translation and/or replication (Klebanoff and Coombs, 1991; Hashemi et al., 1999; Ahmed et al., 2010).

Our data support the hypothesis that L. iners is less efficient than L. crispatus in preventing BV and other adverse reproductive health outcomes (Verstraelen et al., 2009; Srinivasan et al., 2010; Gajer et al., 2012; Jespers et al., 2012; Santiago et al., 2012). Recent genomic and transcriptomic studies suggest that L. iners is highly adapted to the vaginal compartment (Macklaim et al., 2011). However, it differentially expresses over 10% of its genome in dysbiotic compared with healthy states, with increased expression of a cytolysin, mucin, glycerol transport and related metabolic enzymes (Macklaim et al., 2013). These changes likely result in the production of succinate and other short-chain fatty acids as the end product of metabolism as opposed to lactic acid, leading to an increased vaginal pH. L. iners might also be the first Lactobacillus species to recover after dysbiosis (Gajer et al., 2012), which suggests a bidirectional relationship between L. iners and vaginal pathogens or dysbiosis. Like lactobacilli, G. vaginalis and Prevotella species are almost always present in the vaginal microbiome, but in much higher abundance in BV; some studies have noted a synergistic effect between them, perhaps due to metabolic dependencies (Ling et al., 2010; Zozaya-Hinchliffe et al., 2010; Ravel et al., 2011; Jespers et al., 2012). Several subspecies of G. vaginalis have been described, with different levels of epithelial adhesion capacity (Paramel Jayaprakash et al., 2012; Castro et al., 2013). The latter is the first step toward biofilm formation, which is thought to be an important mechanism of BV persistence (Machado et al., 2013). A. vaginae, which is also present in high abundance in BV, most likely has an important role in biofilm formation as well (Machado et al., 2013).

Some limitations of our study (such as its cross-sectional nature) and the field in general (heterogeneity in populations and methodology) have already been mentioned. Other limitations of our study include small sample sizes of some of the comparison groups (most notably the limited number of bacterial STI cases despite oversampling of these cases), imprecise timing of certain behaviors around the time of sampling, lack of a control group of women at low risk for HIV and STIs and the fact that we only used cervical samples (including endo- and ectocervix). Although small differences between the cervical and vaginal microbiome have been described, the types and relative quantities of the most abundant bacteria are similar between the two sampling sites and bacterial community compositions are also similar (Nikolaitchouk et al., 2008; Kim et al., 2009; Ravel et al., 2011; Smith et al., 2012).

We used a microarray instead of next generation sequencing or quantitative PCRs for several reasons. First, microarrays can assess the presence of multiple bacteria in a semi-quantitative manner. The number of probes included on the microarray can be reduced as knowledge about optimal and suboptimal vaginal microbiota increases, which may eventually result in a diagnostic tool. The main limitations of microarrays are their inability to detect ‘new’ species that are not included a priori on the microarray and the fact that they are not fully quantitative. However, to obtain fully quantitative data, one would have to perform multiple qPCR assays, which is time-consuming and expensive.

Now that vaginal microbiota of women with and without BV in different parts of the world have been well described, and molecular techniques have become more accessible and affordable, we believe that the time has come to incorporate these techniques into larger epidemiological studies with clinical outcomes. These studies should investigate the temporal relationships between cervicovaginal microbiota and adverse reproductive health outcomes, including adverse pregnancy outcomes and invasive infections in women and their neonates. They should also address other unanswered clinical questions, such as the role of bacterial loads of different types of bacteria in these adverse outcomes. At the same time, laboratory studies should further investigate the functional characteristics of different microbiome communities to improve our understanding of the etiology of dysbiosis and the pathogenesis of its clinical consequences. Eventually, interventions that restore and maintain lactobacilli-dominated microbiota, and particularly L. crispatus-dominated microbiota, should continue to be optimized and tested. If successful and affordable interventions are identified, they could potentially have a significant public health impact. For example, while studies have shown that the increased risk for HIV acquisition by cervicovaginal dysbiosis is only about 1.5–2.0 (compared to about 2.0–3.0 for STI pathogens), the overall population impact would be large because of the very high prevalence of dysbiosis (30–60%) in areas with generalized HIV epidemics (Hayes et al., 2010). Our study and other recent molecular vaginal microbiome studies have provided important new insights into the cervicovaginal microenvironment, and as a result, potential public health interventions can now be properly evaluated.