Vaginal microbiota diversity and paucity of Lactobacillus species are associated with persistent hrHPV infection in HIV negative but not in HIV positive women

The vaginal microbiota is thought to play a role in modulating risk of high-risk human papillomavirus (hrHPV) infection. We examined the relationship between the vaginal microbiota and persistent hrHPV infection in HIV-negative and HIV-positive women. We used 16S-rRNA sequencing to characterize the vaginal microbiota of two serial samples taken six months apart from 211 Nigerian women (67%, 142/211 HIV-positive and 33%, 69/211 HIV-negative) and evaluated the association between the vaginal microbiota and persistent hrHPV infection using generalized estimating equation logistic regression models and linear discriminant analysis effect size (LEfSe) algorithm to identify phylotypic biomarkers of persistent hrHPV infection. The high diversity microbiota, Community State Type IV-B, was the most prevalent in both HIV-negative (38% at baseline, 30% at the follow-up visit) and HIV-positive (27% at baseline, 35% at the follow-up visit) women. The relationship between the vaginal microbiota and persistent hrHPV was modified by HIV status. In HIV-negative women, women with Lactobacillus dominant microbiota had lower odds (OR: 0.35, 95% CI 0.14–0.89, p = 0.03) of persistent hrHPV compared to women with Lactobacillus deficient microbiota. While among HIV-positive women, the odds of being persistently infected with hrHPV was higher in women with Lactobacillus dominant microbiota (OR: 1.25, 95% CI 0.73–2.14 p = 0.41). This difference in effect estimates by HIV was statistically significant (p = 0.02). A high diversity vaginal microbial community with paucity of Lactobacillus species was associated with persistent hrHPV infection in HIV-negative women but not in HIV-positive women.

peptides 10,11 . Disruptions in these defence mechanisms, as could occur with high diversity vaginal microbiota, may modulate risk of viral infections such as HIV 12 and HPV 13 . High diversity vaginal microbial communities are associated with proinflammatory responses which may directly damage the cervical epithelial barrier, thereby facilitating HPV entry to basal keratinocytes and establishment of infection 4,14 . The microbiota may also interact with cellular pathways in ways that are not yet clear, to lead to persistent, productive viral infections [14][15][16][17] .
Most 16S rRNA sequencing studies of the vaginal microbiota and hrHPV infection to date have focused on prevalent hrHPV infection and vaginal microbiome patterns 6,13,16,[18][19][20][21][22] . Collectively, these studies showed increased risk of prevalent HPV infection in women with high diversity vaginal microbiota or with decreased relative abundance of Lactobacillus species 13,16,19 . Thus far, only one study has investigated the association between the vaginal microbiota and persistent hrHPV 21 . In that study, vaginal samples collected at one baseline visit were used to characterize the vaginal microbiota, while hrHPV persistence was determined by testing two serial samples taken 12 months apart 21 . Given that the vaginal microbiota is dynamic, cross-sectional studies of the vaginal microbiota that rely on vaginal sampling at a single time point are inadequate to fully characterize the relationship between the vaginal microbiota and persistent hrHPV 23 . Therefore, longitudinal studies with repeated vaginal microbiota assessments are warranted. Earlier work by Brotman et al. to address this knowledge gap involved extensive sampling of the vaginal microbiota and repeated HPV testing. However, that study was limited by the lack of data on high risk HPV 24 .
We previously reported an association between prevalent hrHPV and decreased relative abundance of Lactobacillus species 18 . In this study, we examine the relationship between the vaginal microbiota measured at two different time points and persistent hrHPV infection in a nested case control study of HIV-negative and HIVpositive Nigerian women.

Results
Study population. From a prospective population-based cohort study of 1020 women, we selected 211 women (94 cases of persistently any HPV infected and 117 controls of persistently HPV negative women) for inclusion in this nested case control study using a 1:1 match for HIV positive women and a 1:2 match for HIV negative women. Details of participant selection are provided in Fig. 1. Table 1 shows the baseline characteristics of the study population stratified by HIV status. Mean (SD) age of participants at enrolment was 38 (8) years while mean age at sexual debut was 20 (3) years. Most participants had more than six years of formal education (88%, 186/211) and were premenopausal (84%, 175/211). More than half of the participants were married (58%, 122/211) and douche regularly (60%, 126/211). The median interval (IQR) between baseline and follow-up visit was 6 (6-8) months.
Among cases, the commonest hrHPV types detected were HPV 52 and HPV 35 at both baseline (27%, 25/94 and 23%, 22/94 respectively) and follow up visits (31%, 29/94 and 19%, 18/94 respectively). Details of the frequency distribution of other high risk HPV types identified in women with persistent any HPV infection at baseline and follow up stratified by HIV status are shown in Table 2. In women with persistent hrHPV infection, type specific persistence (same hrHPV type present at baseline and follow up) was commoner (87%, 62/71) than group persistence (13%, 9/71). The commonest hrHPV types in type specific persistence were HPV 52 (31%, 22/71) and HPV 35 (24%, 17/71). Further details of HPV prevalence, persistence and clearance of the parent cohort have been previously published 25,26 .
There were 422 visits made by the 211 participants selected. Approximately, 84% (353/422) of vaginal microbiota samples taken at these visits were available for 16S rRNA sequence analysis which yielded 12, 967, 629 high quality reads. The mean (SD) number of reads per sample was 36, 021 (14,375), with a mean (SD) read length of 427 (6) base pairs.
Vaginal microbiome composition in the study population. Hierarchical clustering analysis based on species composition and relative abundance identified five major clusters that showed similarities to previously described vaginal microbiome community state types (CST) 23 . Three of these CSTs were dominated by high proportions of Lactobacillus specie: L. crispatus in CST I, L. gasseri in CST II and L. iners in CST III (Fig. 2). CST I-B had moderately high proportions of Lactobacillus species, most commonly L. crispatus as well as low proportions of pathobionts such as Enterococcus and Streptococcus, which are not commonly seen in CST I. CST IV-B was characterized by greater taxonomic diversity with low proportions of Lactobacillus, pathobionts such as Escherichia, Enterococcus and Streptococcus as well as bacterial vaginosis associated taxa such as Gardnerella, Atopobium, Megasphaera, Sneathia and Prevotella.
In HIV negative women, the point prevalence of CST IV-B was 38% (26/69) at the baseline visit and 30% (21/69) at the follow up visit ( Table 3). The difference in proportions (0.12; 95% CI − 0.06-0.30; p = 0.20) was not statistically significant. In HIV positive women, the point prevalence of CST IV-B was 27% (38/142) at the baseline visit and 35% (50/142) at the follow up visit. This CST distribution at the baseline and follow up visit was not statistically significant (difference in proportions = 0.06, 95% CI = − 0.10-0.14, p = 0.75). Of the Lactobacillus species dominant CSTs, CST III was the most prevalent at both baseline (15%, 10/69) and follow up (30%, 21 (Table 3). Markov chain transition probabilities indicated substantial variability in inter-CST transitions over the twotime points (baseline and follow-up visits) (Fig. 5). Patterns of transition probabilities were comparable among HIV positive and HIV negative women. In both HIV negative and HIV positive women, the highest inter-state transition probabilities were to the more diverse community (CST IV-B). While the highest self-transition  Table 1. Baseline characteristics of study population by HIV status. *Proportion of women who douche with water among women who regularly douche.

Association between persistent hrHPV infection and vaginal microbiota.
A priori, we considered the possibility of HIV modifying the relationship between the vaginal microbiota and persistent hrHPV based on results from previous studies and biological plausibility 18,27 . In effect modification analysis, the odds ratio of having persistent hrHPV for women with a Lactobacillus dominant microbiota (> 70%) versus a Lactobacillus depleted microbiota was multiplied by a factor of 3.39 (95% CI 1.18-9.70, p = 0.02) for HIV-positive women as compared to HIV-negative women. Therefore, we present effect estimates by HIV strata. Among HIV negative women, the odds of persistent hrHPV infection were significantly lower in women whose microbiota were dominated by Lactobacillus species (OR: 0.35, 95% CI: 0.14-0.89, p = 0.03) than women with Lactobacillus depleted microbiota. In HIV positive women, there was no significant association between persistent hrHPV infection and the vaginal microbiota in the Lactobacillus dominant model (OR: 1.25, 95% CI: 0.73-2.14, p = 0.41) ( Table 5). There were also no significant associations between the vaginal microbiota and persistent hrHPV in the other models (L.crispatus dominant [> 70%] model and Community State Assignment model) considered for both HIV-negative and HIV-positive women.
In sensitivity analyses varying cut points to dichotomize relative abundance of Lactobacillus species, with values ranging from 50 to 80% in increments of 5%, the association between Lactobacillus dominated microbiota and persistent hrHPV in HIV negative women was insensitive to cut off points between 50 and 80% indicating that results were robust within this range of cut points (Supplementary Table 1). Analysis of characteristics of Table 2. Distribution of hrHPV types in 94 women with persistence of any HPV.  www.nature.com/scientificreports/ participants missing microbiome data indicated that missingness at baseline was associated with socioeconomic status; while missingness at follow up was associated with HIV status (Supplementary Table 2). These variables (socioeconomic status, study visit and HIV status) were used as auxiliary variables during multiple imputation for sensitivity analysis. Effect estimates from the imputed data were similar to effect estimates obtained from complete case analysis (Supplementary Table 3). www.nature.com/scientificreports/ We used LEfSe modelling to identify specific bacterial taxa that may be differentially abundant in HIV negative women with and without persistent hrHPV; and HIV positive women with and without persistent hrHPV. We included HIV positive women in LEfSe analysis as the broad categories in the regression models may mask differences in non-dominant bacterial taxa. We found that in HIV-negative women, women with persistent hrHPV infection had significant over-representation of Leptotrichia species in their vaginal microbiota compared to HIV-negative women without persistent hrHPV infection (Fig. 6) Conversely, there was significant overrepresentation of Veillonella species in HIV-negative women who were persistently hrHPV negative compared to HIV negative women with persistent hrHPV infection (Fig. 6) . In HIV-positive women, we did not identify any discriminative features between the vaginal microbiota of women with persistent hrHPV infection and women who were persistently HPV negative.

Discussion
In this nested case control study of the vaginal microbiota and persistent hrHPV infection in HIV-negative and HIV-positive women, we found that a high relative abundance of Lactobacillus species was associated with reduced odds of persistent hrHPV infection in HIV-negative women but not in HIV-positive women. These results extend findings from our previous study, in which we showed a significant association between a low relative abundance of Lactobacillus species and increased odds of prevalent hrHPV, a more proximal causal intermediate in the relationship between hrHPV and cervical cancer 18 . Consistent with our previous report on prevalent hrHPV and other studies, we also identified Leptotrichia species to be over-represented in HIV-negative women with persistent hrHPV 18,21 .  www.nature.com/scientificreports/ While recent longitudinal studies with repeated vaginal microbiota assessments that use next generation sequencing techniques have shown an association between higher vaginal microbial diversity and prevalent hrHPV or persistent any HPV, our study is the first to show this for persistent hrHPV 19,24 . In one study conducted among Italian women, Di Paola et. al. provide evidence to show that a high diversity vaginal microbiota, CST IV, at baseline was predictive of future odds of persistent hrHPV, despite not accounting for the temporal dynamics of the vaginal microbiota 21 . Several other cross-sectional studies have showed that decreased relative abundance of Lactobacillus species is associated with precancerous cervical lesions and cervical cancer [28][29][30] . Collectively, these results suggest that the vaginal microbiome may play a role in cervical cancer pathogenesis.
It is plausible that the vaginal microbiome modulates risk of HPV acquisition 31 , although there is limited mechanistic evidence for this relationship. It is also possible, though less probable, that the presence of HPV interacts with the host immune system through complex mechanisms that lead to the presence of immunosuppressive CST I 6 (9) 2 (4) 1 (7) 3 (33) 11 (8) 4 (6) 6 (11) 1 (7) CST II 7 (10) 5 (11) 0 (0) 2 (22) 3 (2) 1 (1) 2 (4) 0 (0) CST III 10 (15) 7 (15) 2 (14) 1 (11) 13 (9) 7 (10) 5 (9) 1 (7) CST I-B 4 (6) 4 (9) 0 (0) 0 (0) 37 (26) 19 (27) 16 (28)  2 (14) CST IV-B 26 (38) 17 (37) 9 (64) 0 (0) 38 (27) 17 (24) 17 (30)  4 (29) Missing CST 16 (23) 11 (24)    www.nature.com/scientificreports/ cytokines that favour different vaginal communities 17,32,33 . Alternatively, sexual activity, a common cofactor for both vaginal microbiota and HPV infection may explain the associations observed. Consistent with previous studies on the temporal dynamics of the vaginal microbiota, we found that CST IV-B was the most stable CST and when transitions did occur, they were most commonly to the L.iners dominated CST III 23 . These findings of the stability of CST IV-B may be considered to be qualitatively similar to a longitudinal study among 80 African women which showed that high diversity associated bacteria such as Prevotella bivia and Gadnerella vaginalis were consistently present in 91% and 58% of healthy women with a Nugent score of 0-3 respectively, while L. crispatus was consistently present in only 47% 34 . These findings may reflect a lack of optimal protection against the growth of strict anaerobes by L. iners or the ability of L.iners to thrive in a range Table 4. Association between potential risk factors and vaginal microbiome. *Relative abundance of Lactobacillus species > 70%. **Relative abundance of L. crispatus > 70%. † Frequency of exposure at the two visits combined. The outcomes and some variables (oral contraceptive use, condom use, total sex partners in 1 year, total sex partners in lifetime, sex within 24 h, BMI, douche regularly, time since last menstrual period, menopausal state) were modelled as time varying covariates. Correlation within person measurements were accounted for in GEE models. Given the variability between participants that clustered in CST groups, we used a marker of community functionality-hence analysis by relative abundance of Lactobacillus. 70% is used at cut off because of the distribution of the relative abundance of Lactobacillus in the different CST groups. ‡ 59 participants were missing vaginal microbiota at baseline, and 13 participants were missing vaginal microbiota at follow up.  35 . Future studies on the functional metabolomics profile associated with L. iners may help to understand the role of this specie in the maintenance and potential restoration of a healthy vaginal microbiota. We found that HIV-positive women were less likely to have a high relative abundance of Lactobacillus species or L. crispatus compared to HIV-negative women. Microbial diversity has been reported to be an important risk factor for HIV acquisition and viral shedding 20,36,37 . Recent reports, in which HIV susceptible CD4 + cells are increased in the genital tract in women with high diversity microbial communities, provide further mechanistic evidence for the link between increased risk of HIV acquisition and viral shedding 15,16 . In addition, it is possible that the lower relative abundance of Lactobacillus species among HIV positive women in our study was due to immunosuppression resulting in subsequent colonization of genital tract by diverse bacteria.
Similar to previous studies, we did not find any significant association between vaginal douching and the vaginal microbiota 20,34,38,39 . The influence of hygienic practices such as vaginal douching on the microbiota may be mediated by factors such as the type of materials used for douching and timing in the menstrual cycle 40,41 . In a meta-analysis of intra-vaginal practices and bacterial vaginosis, which included 13 prospective studies in Sub Saharan Africa, vaginal douching with soap, but not other materials, was associated with bacterial vaginosis 41 . In our study population, majority of women who reported douching regularly, used water to douche, providing some explanation for our findings.
In this analysis, we did not find any association between lifetime number of sexual partners, number of partners in past year, recent sexual activity and the vaginal microbiota. Previous studies have reported reduced Lactobacillus species in women with recent sexual activity 23,34 . Our inability to detect these relationships may be due to the small sample size of women who reported recent sexual activity in our study.
Among HIV negative women, results from LEfSe analysis indicated that while Leptotrichia species was differentially more abundant in persistent hrHPV infected women, Veillonella species was differentially more abundant in hrHPV negative women. The association between Letptotrichia species and hrHPV infection has been reported in previous studies 18,21 . Less frequently reported is the observation of a higher abundance of Veillonella  www.nature.com/scientificreports/ in hrHPV negative women. Veillonella are anaerobic, gram negative cocci commonly found in the oral cavity, gut and vaginal tract that play diverse roles in different niches 42 . The pathogenic role of Veillonella is uncertain and they are currently considered to be of low virulence 43,44 . While Veillonella species have been associated with more diverse microbial communities, including the formation of biofilms leading to dental plaques in the oral cavity and serious infections such as endocarditis 45 , they have also been found to be more abundant in HPV negative women compared to HPV positive women 13,46 , as well as in the oral microbiota of HPV negative infants compared to the oral microbiota of HPV positive infants 46 . Paradoxically, while they may be less common in HPV positive vaginal microbiota, they are increasingly more abundant in distal end points in the natural history of cervical cancer pathogenesis, such as in precancerous lesions 13 , Bogert et al. observed that the cytokine response profile of Veillonella species within the gut microbiota is dependent on the combinations of the bacteria species present 47 . These findings taken together with the paradoxical observations, may highlight the interplay between the integrated responses to multiple representative species within a microbial environment in influencing disease risk and warrant a deeper functional characterization of the vaginal microbiota. The lack of association between Lactobacillus as a genus and hrHPV negative status in LEfSe analysis is noteworthy, given the strong associations between Lactobacillus species and hrHPV negative status that has been reported in previous studies. However, in this population, it must be noted that the commonest and the most stable microbial communities was the diverse CST IV-B community. This provides credence to current understanding that even in diverse communities, the important catabolic function, of lactic acid production, is conserved among communities despite differences in species composition.
Definitions of HPV persistence vary considerably across studies 48 . The commonest definitions include HPV positivity measured at two or more time points with an interval of six or more months 48 . At one extreme of this definition is the requirement for the same HPV genotype to be present at both time points (genotype-specific persistence), and the other extreme is the presence of any HPV at both time points (group persistence). Conceivably, HPV genotype-specific persistence is better suited for identifying women at increased risk of cervical cancer and its precursor lesions, as group persistence could be the result of one HPV type clearing and another being acquired. However, results from a population-based cohort study in Denmark have shown that women with hrHPV group persistence have increased odds of high-grade precancerous lesions that are almost 200fold higher than observed in women persistently negative for HPV 49 . Furthermore, in studies investigating the use of group persistence versus genotype specific persistence to inform referral practices, the use of genotype specific persistence did not improve risk stratification over the use of group persistence in women referred for colposcopy indicating that group persistence is a good proxy for measuring type-specific persistence and risk of high grade precancerous lesions 50,51 .
While our study is the single largest nested case control study of persistent hrHPV and the vaginal microbiota among HIV-negative and HIV-positive women in Africa to date, it has some limitations. As with all case control studies, there is potential for selection bias when the selected cases and controls are systematically different from the target population especially when the selection criteria are associated with both the outcome and the exposure of interest 52 . In this study, the risk of selection bias is minimized for two reasons. First the parent cohort is a longitudinal population-based cohort; cases and controls selected from this cohort are representative of the source population with regards to exposure-outcome relationships. However, the prevalence of the exposure and outcome may vary substantially between the selected study population and the source population. Therefore, we limit our inferences to associations between exposure and outcome and make no inferences about the generalizability of the prevalence estimates of hrHPV or vaginal microbiota observed in this population. Secondly, while our selection criterion was associated with the outcome (all cases were selected), it was not associated with the exposure of interest. This reduces the risk of bias due to selection.
We included microbiome data from only two time points and did not perform qPCR to determine concentration. The human vaginal microbiota composition can be highly variable over short periods of time due to several factors, therefore sampling over shorter time intervals for extended periods are required. However, we did not find significant transitions of CST-IV to other CSTs, suggesting that additional sampling among women with CST IV may not have changed our results. There was some degree of missingness for vaginal microbiota data which could be a source of bias if differential. However, results from sensitivity analysis where missing vaginal microbiota was imputed based on visit (baseline or follow up), socioeconomic status and HIV status was not markedly different from the results based on complete non-missing data. We assessed HIV status at enrolment into the study but did not repeat HIV testing at follow up. It is possible that some women classified as HIV negative at the beginning of the study seroconverted during follow up, resulting in misclassification. We did not account for the CD4 + counts, viral loads, ART use or duration of infection in the HIV positive women included in this study and these may have affected our findings for HIV positive women.
In summary, we found that HIV negative women with a high relative abundance of Lactobacillus species in the vaginal microbiota had lower odds of persistent hrHPV compared to HIV negative women with lower relative abundance of Lactobacillus species. There was also an inverse relationship between the relative abundance of Lactobacillus species and HIV infection, such that women who were HIV-positive had lower odds of a Lactobacillus specie dominant vaginal microbiota. Further work is required to understand the functional profile of the vaginal microbial communities and how this interacts with host factors to modulate risk of hrHPV persistence in the pathogenesis of cervical cancer.

Methods
Study population. We enrolled 1,020 women from our cervical cancer screening studies into a prospective study of host and viral factors associated with persistent hrHPV infection in Abuja, Nigeria between 2012 and 2013 53,54 . We included non-pregnant women who were at least 18 years old, had engaged in vaginal sexual inter- www.nature.com/scientificreports/ course and had no previous history of hysterectomy or cervical cancer. Trained nurses collected information on sociodemographic characteristics, self-reported HIV status, lifestyle factors, sexual behaviour and reproductive history from all participants. Participants were scheduled for a follow-up visit six months after the baseline visit. At both baseline and follow-up visits, nurses performed gynaecological examinations and collected biological specimens.
For this analysis, we used a nested case control design restricted to 211 women selected as cases and controls. Details of patient selection are provided in Fig. 1. Of the 1020 women in the parent cohort, 23 HIV negative women and 71 HIV positive women were persistently positive for any HPV. All of these 94 women were selected as cases in this nested study. For each case we identified either one (for HIV positive cases) or two (for HIV negative cases) controls from the cohort, matched by age (± 5 years) and study enrolment date (± 5 days) resulting in 117 controls in total. These 211 women provided data at two clinic visits resulting in 422 visits.
Sample collection and processing. We collected mid-vaginal swabs from all participants using the Elution Swab system (Copan, Italy) which was inserted in one ml Amies' Transport media (Copan, Italy) and stored at − 80 °C till further processing. Whole genomic bacterial DNA was extracted from the swabs using the MoBio Powersoil kit, as previously described 18 . Bacterial 16S rRNA gene sequencing and analysis. A dual barcode system with fusion primers 338F and 806R was used to amplify the V3-V4 hypervariable regions of the 16S rRNA gene as previously described 55,56 . Purified amplicons were sequenced on an Illumina Miseq instrument (Illumina, San Diego, CA) using a 300 bp paired end protocol at the Institute for Genome Sciences, University of Maryland School of Medicine. Raw reads were pre-processed to remove the first three and last three bases if their Phred score was lower than 3. Read ends were trimmed if the average Phred quality score of four consecutive bases was below 15 and only retained if their length was at least 75% of their original length after trimming. QIIME (v1.8.0) 57 was used to perform quality control of the sequence reads. Fast Length Adjustment of Short reads (FLASH) 58 was used to assemble paired reads, with an overlap of ~ 90 bp on average. De-multiplexing by binning sequences with the same barcode was performed in QIIME (v1.8.0). Chimera detection was done using de novo and referenced-based methods in UCHIME (v5.1) with the Greengenes database of 16S rRNA gene sequences (August 2013) as a reference 59,60 . The processed 16S rRNA gene amplicon sequences were assigned to genera and species, using PECAN that uses fifth-order Markov Chain model for precise species-level assignments and a pre-compiled database that contains all known microbes in the vaginal microbiota. Raw, demultiplexed, sequence files are available at the NBCI Sequence Read Archive (https ://www.ncbi.nlm.nih.gov/sra) with accession code PRJNA668516. Hierarchical clustering analysis based on Jensen Shannon distances using ward linkage was used to cluster samples into Community State Types (CSTs) using the vegan package in R 23 .
HPV detection. We extracted DNA from ectocervical cells and used the SPF 10 LiPA 25 system for HPV DNA detection 61 . Briefly, we performed polymerase chain reaction on DNA isolates using biotin labeled SPF 10 primers which generated small amplicons of 65 bp. The amplified PCR products were tested using DNA Enzyme Immunoassay (DEIA), a probe hybridization assay containing a cocktail of conservative probes that allows the recognition of at least 65 HPV genotypes. The amplimers of the DEIA HPV DNA positive samples were subsequently analyzed by the reverse hybridization line probe assay, LiPA 25 , Version 1 (Laboratory Biomedical Products, Rijswijk, The Netherlands) which can identify 25 HPV types, 12 of which are high-risk: 16, 18, 31, 33, 35, 39, 45, 51, 52, 56, 58, and 59. In this analysis, we use group persistence as the definition of persistent hrHPV, that is, the presence of at least one hrHPV type at baseline and one hrHPV type at the follow up visit as opposed to type specific persistence, which is the presence of identical hrHPV types at the two visits.
Covariates. We collected information on potential confounders such as menopausal state, time since last menstrual period, douching practice, total number of sexual partners in past year, body mass index (BMI), vaginal pH, sexual activity within 24 h of sampling, use of condoms and hormonal contraceptive use. Vaginal pH was assessed as an ordinal variable with three categories: < 4.5, 4.5-5.5 and > 5.5. These covariates were assessed at the baseline visit and updated at the follow up visit. They were included in the models as time varying exposures. The vaginal microbiota was also modelled as a time varying exposure. In addition to CST based models, we conducted analysis based on the relative abundance of Lactobacillus (at the genus level) and L. crispatus. We specifically investigated L.crispatus at the species level because, of all the species of the Lactobacillus genus, commonly identified in the vaginal microbiota, communities with high relative abundance of L.crispatus have the lowest vaginal pH values, a marker for important vaginal defence mechanisms 20,33,62 . Statistical analysis. We compared the characteristics of participants at baseline by HIV status. We used generalized estimating equation (GEE) logistic regression models with robust variance estimation and the cluster option, which accounts for repeated measures, to investigate the relationship between potential risk factors and the vaginal microbiota; and the association between the vaginal microbiota and persistent hrHPV infection. Variables considered for inclusion in multivariable regression models were selected using a two-step approach. First, sociodemographic characteristics and potential risk factors associated with persistent HPV infection at a nominal p < 0.10 in the parent cohort of this study were identified from previously published results 25 . The variables identified from this step were BMI, marital status, education, socioeconomic status, age at sexual initiation, total lifetime sex partners, vaginal douching, vaginal pH and HIV status. Next, we examined potential risk factors and its association with vaginal microbiota in this cohort of participants. Variables that were significantly www.nature.com/scientificreports/ associated with the vaginal microbiota at p < 0.10 were identified. Finally, variables that were associated with both persistent HPV infection and the vaginal microbiota were selected for inclusion in multivariable models. In modelling the vaginal microbiota, we considered three approaches: dichotomising based on L. crispatus relative abundance (using a cut off of 70%), dichotomising based on all Lactobacillus species relative abundance (using a cut off of 70%) and using CST assignments. To select the cut-off point, we used the mean of the relative abundance of L. crispatus in CST I because L. crispatus is often considered a marker of a healthy microbiome 19,63 . We investigated the sensitivity of results to various cut points by varying the cut points in increments of 5% from 50 to 80%. The upper and lower bounds for these sensitivity analyses were selected to ensure that we reached a point where further dichotomization would become uninformative.
We used Markov chain models generated by inferring transition probabilities from our data to evaluate vaginal community dynamics over the two time periods, the baseline visit and the follow up visit.
We used Linear Discriminant Analysis (LDA) effect size (LEfSe) algorithm to identify bacterial taxa that were differentially present in women with persistent hrHPV infections 64 .
Of the 422 clinical visits included in this study, results for 69 vaginal microbiota samples were missing. We performed univariate analysis for missingness at baseline and at the follow up visit to determine whether missingness was associated with important covariates such as HIV status, persistent hrHPV, age, socioeconomic status, education, menopausal state, time since last menstrual period, douching practice, BMI, sex within the last 24 h prior to sample collection and total number of sexual partners in the past one year. Furthermore, to minimize the potential for selection bias, we imputed missing microbiota data using a data augmentation algorithm in a Markov Chain Monte Carlo (MCMC) procedure creating 10 datasets in 1000 iterations. We included three variables as auxiliary predictor variables (socioeconomic status, study visit and HIV status) identified from the univariate analysis of missingness. Next, we implemented GEE logistic regression models in the imputed dataset and compared results obtained from the imputed data with results obtained from complete case analysis.
Based on previous studies that have provided evidence of increased microbial diversity and a higher risk of persistent hrHPV infections among HIV positive women compared to HIV negative women 25,27,65 , we considered the possibility of HIV modifying the relationship between the vaginal microbiota and persistent hrHPV infection, a priori. We tested for this statistically by fitting an interaction term in the GEE logistic regression model and comparing the fitted model to a model without the interaction term using likelihood ratio tests. The interaction term was statistically significant (p = 0.02). Therefore, we present results by HIV strata.
All analyses were conducted using Stata/SE 15 (StataCorp, College Station, Texas) and R version 3.4.0 (R Core Team, Vienna, Australia).

Ethics.
All methods were carried out in accordance with relevant guidelines and regulations. We obtained ethical approval to conduct this study from the National Health Research Ethics Committee of Nigeria and the University of Maryland School of Medicine Institutional Review Board. All participants provided written informed consent prior to enrolment. www.nature.com/scientificreports/