Introduction

Typically, the vaginal microbiota of healthy women has low diversity in relation to other locations on the body1. This diversity is consistent with the low pH (< 4.52,3) maintained by lactic acid produced by the different predominant Lactobacillus species, which is thought to restrict colonization of vaginal pathogens4,5. However, not all Lactobacillus are considered equally beneficial6.

Decrease in Lactobacillus spp. Abundance is often associated with a condition commonly known as bacterial vaginosis (BV)2. Bacterial vaginosis is associated with symptoms such as fishy odor discharge and discomfort, but some women show BV while remaining asymptomatic7. Importantly, vaginal profiles with high bacterial diversity has been associated with increased risk for pelvic inflammatory diseases8, preterm births9,10, and sexually transmitted infections (HIV11, herpes simplex virus type 212 and human papillomavirus (HPV)11,13,14,15,16,17,18. Increased vaginal bacterial diversity has been reported after antibiotic therapy 19 and personal vaginal hygiene practices20,21, some sex practices22,23, high fat diet24,25, smoking26, and other factors that may vary across lifestyles. Despite the protection associated with high proportions of vaginal Lactobacillus, profiles dominated by Lactobacillus iners have been associated with increased risk of Chlamydia trachomatis infection27. Additionally, differences in the vaginal microbiota have also been associated with ethnicity (which is a composite of genetic and the behavioral attributes); Blacks and Hispanics have higher vaginal microbiota diversity28 than Caucasian and Asian women29,30,31,32.

Most studies on vaginal microbiota have focused on urban women in developed countries15,29,30,31,32,33, and only a few refer to women from urban34 or rural areas in developing countries11,35. Amerindians -peoples of Asian descent, from migrants that expanded in the Americas 14,000–24,000 years ago36,37, have remained in isolation (with genetic drift leading to genetic divergence38,39,40,41,42). The arrival of Europeans and Africans in the colonial times led to admixture43, resulting in the current mestizo population predominant in South America. Yet, there are still villages of isolated Amazonian communities living pre-agricultural lifestyles, with low frequency of medical visits, no running water or electric services, no market economy, and subsisting on fishing, hunting, and agriculture practices. Increased access to jobs, education and healthcare services have led to migration to urban areas thus starting a process of urbanization. In addition to the isolated and the urban Amerindians, other communities are in transition, with intermediate access to healthcare, public services and industrialized products. This transition occurs in a heterogeneous fashion within communities, with different individuals exposed differently. For example, in communities with low exposure to urban settings, a few villagers may be the ones who go to the urban locations, with incentives (teachers, nurses), or to sell their products in markets and slowly adopt urban practices.

Since different lifestyle practices and ethnicities may influence human vaginal microbiota, we decided to explore the vaginal microbiota of Amerindians in a gradient of urbanization, and between Amerindians with mestizos. In this study, we compared cervicovaginal and introital microbiota in Amerindian Piaroa women by urban status, and with mestizo women living in a town. Additionally, we considered HPV infection status in the analysis.

Results

The study included 7 different communities of the Venezuelan Amazonas state (Fig. 1a). A total of 228 women received medical services. Out of this sample, 111 (49%) sexually active volunteers complied with the inclusion criteria (see methods; Fig. 1b). The mean age of the participants was 28.9 years (12 to 53 years old).

Figure 1
figure 1

Study design. (a) Diagram of geographic locations (black points in the map) of the 7 villages and the urban town (Puerto Ayacucho, indicated with a star) of the study. Pictures depict the town (top) and Piaroa communities with intermediate and low access to urban services (middle and bottom pictures, respectively). (b) Numbers of women recruited, urban stratification, sampling, and sample analyses. In traditional communities, permission by the captain (Chief) preceded individual consent. In the urban town there was a public invitation to participate. From 228 women, who received a gynecological evaluation, 111 agreed to participate and complied with the inclusion criteria. A total of 82 Amerindians and 29 mestizos were included in the study. Surveys to assess urbanization status and clinical conditions were applied. A gynecologist sampled vaginal introitus and cervicovaginal (endo/ectocervix/posterior fornix) sites, using sterile swabs; Papanicolaou smear was performed and vaginal pH was taken. Additionally, blood for HIV, hemoglobin, hepatitis diagnostic, and fecal samples for parasite detection were also collected for ancillary studies. DNA was extracted from swabs, and used for human papillomavirus (HPV) detection and amplification of the regions V1-V3 of the 16S rRNA gene, which was then sequenced with Illumina MiSeq.

There were marked differences in lifestyle among Amerindians urbanization groups. These differences included an increment with urbanization of industrialized food consumption and education and sexual contact with mestizo men and a decrement with urbanization of crop gardening practices and number of pregnancies (Table S1). When looking at health variables, body mass index increased with urbanization (Table S2).

Crop gardening practices were more frequent in Amerindians from the high urbanization group than in mestizos. Mestizo women also reported having sexual practices other than vaginal, sexual contact with mestizo men, and use of vaginal douche (Table S1). All women reported to have monogamous practices, and do not practice female or male circumcision.

Cervicovaginal and introital microbiota

Samples were taken by an MD at the cervicovaginal and introital vaginal sites, and immediately frozen until analyses. DNA was extracted and the region V1-V3 of the 16S rRNA was amplified and sequenced. A total 2,771,167 sequences (13,196 sequences/sample) were obtained from 222 cervicovaginal and introital samples from 111 women (eight environmental controls generated only 152 sequences). The low sequence number and the pattern in the environmental controls suggest that there was not a significant effect of contamination on the composition of the human samples. All environmental control samples were lost after rarefying at 1,655 sequences/sample. In addition, 25 (11%) samples with a low sequence number were also lost. The total number of samples analyzed was 196: 95 cervicovaginal and 101 introital (Fig. 1b, Table S3).

Amplicon Sequence Variants (ASVs) tables generated 1,015 different ASVs across all samples (mean of 15.4 and 20.1 ASVs for cervicovaginal and introital samples, respectively, Table S3). ASVs assigned to the same taxonomy were combined and this final table was used for the analysis. The average observed and estimated ASVs taxa per sample were respectively 9.19 and 9.32, showing a good sampling coverage of 92% according to Chao1. A total of 115 cervicovaginal and 161 introital ASV taxa were identified (Table S3), of which interestingly, 18 matched unidentified bacteria (Table S4), and 8 had sequence identity < 97% to sequences in non-redundant nucleotide NCBI database using the megaBLAST algorithm.

Beta diversity comparisons among all women groups (Amerindian urbanization groups and mestizos) showed significant differences for both vaginal sites using unweighted UniFrac distances (p > 0.024, PERMANOVA, cervicovaginal: R2 = 0.10, and introital: R2 = 0.09; Fig. S1a) or marginally significant for Bray Curtis dissimilarities (p > 0.057, PERMANOVA, R2 = 0.062; Fig S1a). However, since ethnicity and location might be confounding factors, comparisons were conducted only among the Amerindian group controlling by village. Fewer differences were found across the urbanization gradient by any of the vaginal sites (p > 0.340, Bray Curtis dissimilarity, PERMANOVA, cervicovaginal: R2 = 0.056, and introital: R2 = 0.037; Fig. 2a,b, Table S5). Alpha diversity comparisons among all groups of women showed marginal differences which were lost after p value adjustment for multiple comparisons (p = 0.045, p.adj = 0.135, Faith PD; p > 0.108, for Shannon or Simpson; Fig. S1c). Analyses using a linear mixed-effect model randomizing the location variable and conducted among only Amerindian groups showed a tendency of increasing with urbanization, which was not significant (p > 0.061, for Shannon and Simpson; linear mixed-effect model, LMM; Fig. 2c. Table S5).

Figure 2
figure 2

Cervicovaginal microbiota diversity. (a,b) Principal Coordinate Analysis (PCoA) of Bray–Curtis dissimilarity for Amerindians among urbanization groups (low n = 15, medium n = 20, high n = 20) (a), and between ethnicities: Amerindians n = 15, mestizo n = 27, (PERMANOVA) (b). Gray lines connect samples with the group centroid. Ellipses indicate one standard deviation. (c,d) Alpha diversity using Shannon index among Amerindian urbanization groups (linear mixed-effect model, LMM) (c), and between ethnicities (Kruskal–Wallis test) (b). (e) Discriminant taxa analysis between ethnicities (LEfSe).

There were significant vaginal beta diversity differences of ethnicity between Amerindians as a group, and mestizos (p < 0.030, unweighted UniFrac distance PERMANOVA, cervicovaginal: R2 = 0.036; and introital: R2 = 0.019; Fig. S1b) and marginal differences for Bray Curtis dissimilarity (p > 0.068, PERMANOVA, R2 = 0.022). However, analysis only including Amerindian and mestizo women at the high urban level and living in the same location was not significant in any vaginal site (p > 0.322, PERMANOVA, Bray Curtis dissimilarity; Table S5, Fig. 2d). Marginal or no alpha diversity differences were detected between ethnicities in any vaginal site (p > 0.066 for Faith PD, Shannon or Simpson, Kruskal–Wallis; Fig. S1c) and even fewer differences when only urban Amerindians and mestizo from the same location were compared (p > 0.771, for Shannon and Simpson; Kruskal–Wallis; Fig. 2c. Table S5).

Discriminant analysis between groups of women showed that the majority of the differences observed were between mestizo and Amerindian groups, for both vaginal sites (Fig. S2). Mestizos were particularly higher in anaerobic taxa. Comparisons among only Amerindian urban groups did not show any differential taxa (LEfSe, alpha = 0.05; Table S6). However, in terms of prevalence Dialister micraerophilus increased with urbanization (0%, 5% and 30% for low, medium and high urbanization groups respectively; p = 0.021, Fisher´s exact test, Table S7). Discriminant taxa between ethnicities controlling for urbanization and location, showed in Amerindians significantly lower abundance of Prevotella and Mobiluncus mulieris, and higher Peptoniphilus lacrimalis and Brevibacterium linens compared to mestizo (LEfSe, alpha = 0.05; Fig. 2e); interestingly, Mobiluncus mulieris, was uniquely present in mestizo (in 7/27 women). Taxon prevalence also supported these results (Table S7). No discriminant taxon was found for introital samples between groups of women. A total of 30 taxa were shared among Amerindians from all urbanization levels, and 42 taxa were shared among urban Amerindians and mestizos; 19 taxa were uniquely found in Amerindians and 16 taxa in mestizos (Table S9). In general, most of the group’s unique taxa were found in only one or two women.

Hierarchical clustering analysis of cervicovaginal samples yielded four clusters or community state types (CSTs; clustering Silhouette index value of 0.378, see methods; Fig. 3b,c): L. iners dominated (CST-L. iners), associated with lowest vaginal pH and alpha diversity (Table S9); G. vaginalis dominated (CST-G. vaginalis), associated with intermediate alpha diversity; and two highly diverse clusters, associated with highest vaginal pH and alpha diversity: CST-div1, enriched in bacterial vaginosis-associated bacterium-1 (BVAB1), Lachnospira, Peptoniphilus koenoeneniae, Prevotella amnii, among others; and CST-div2, enriched in Sneathia sanguinegens, Leptotrichia amnionii, Prevotella bivia, Atopobium vaginae, among others.

Figure 3
figure 3

Cervicovaginal microbiota composition and community state type (CST) clustering. (a) Individual vaginal taxa plots; the legend shows the 12 most abundant taxa. (b) Heatmap of the 30 most abundant taxa by cervicovaginal CSTs resulting from the hierarchical clustering analysis (see text for explanation). Right hand boxplot shows Shannon index for each cervicovaginal CSTs: CST-L. iners < CST-G.vaginalis < CST-div2 = CST-div1. Different letters over the boxplots indicate significant differences (Kruskal–Wallis, p < 0.001). Vaginal pH is also shown. (c) Discriminant taxa for each cervicovaginal CSTs (LEfSe, p < 0.01).

Although the proportions of cervicovaginal CSTs were not associated with urbanization level or ethnicity (p > 0.318, log-linear model, ANOVA), the low-pH-associated cervicovaginal CST-L. iners was more prevalent than other CSTs in the low urbanization group (p = 0.001, Fisher’s exact test; p = 0.042, post hoc test; Fig. 4a,c). On the other hand, introital samples clustered in only two groups (clustering Silhouette index value of 0.466, see methods): (1) the low pH- low diversity L. iners dominated (CST-L. iners); and CST-div3, a high diversity cluster enriched in Gardnerella vaginalis, Sneathia sanguinegens, Porphyromonas uenonis among others (Fig. S3 a,b). There were no significant differences in introital CSTs with urbanization or ethnicity (p > 0.317, log-linear model, ANOVA; Fig. 4b,d).

Figure 4
figure 4

Bacterial community state types (CSTs) from cervicovaginal and introital sites in Amerindian urbanization groups and mestizos. (a,b) Heatmap of the 30 most abundant taxa in cervicovaginal (a) and introital (b) microbiota. There were no significant differences among groups for Shannon index and vaginal pH, shown at the bottom of the heatmaps. (c,d) Prevalence for each of the four cervicovaginal (c) and two introital (d) CSTs, by woman group. Confidence intervals of 95% are indicated in each bar. Although no association with urbanization was found for CST (log-linear model), comparison at the interior of each urbanization group showed significant differences (*) in the low urbanization groups (Fisher’s exact test); but not for the introital CSTs. NA indicates missing value.

Lifestyle, health, and demographic variables (Tables S1 and S2) were analyzed for CSTs association (Fisher’s exact test, Table S9). Crop gardening, a surrogate of traditional lifestyle, was associated with low-diversity CSTs (cervicovaginal CST-L. iners and CST-G. vaginalis; p < 0.001, log-linear model). Cytological smears with BV signs (≥ 20% of clue cells) was associated with high diversity cervicovaginal CST-div1 (p < 0.001, log-linear model).

Comparison between the diversity of cervicovaginal and introital samples showed non- significant differences for beta (p = 0.243, R2 = 0.011, PERMANOVA; Fig. S3c), or alpha diversities (Shannon and Simpson index, p > 0.230, Kruskal–Wallis test; Fig. S3f). However, there was 78% congruency of bacterial community state types (such as CST-L.iners or not) in the cervicovaginal and introital sites of individual woman (moderate strength of agreement with a Cohen’s Kappa = 0.553; Fig. S3c). In Amerindians, Dialister micraerophilus and Ureaplasma parvum were significantly higher in the cervicovaginal than the introital microbiota, while in mestizo women Anaerococcus christensenii was higher in the introital than in the cervicovaginal site (p < 0.050, LEfSe; Fig. S3d). No discriminant taxa were observed between both vaginal sites within each urbanization group.

Lactobacillus iners followed by Gardnerella vaginalis were the most abundant taxa in the vaginal microbiota of these populations (Fig. 3a). Close to half of all women (46%) harbored cervicovaginal or introital microbiota dominated by Lactobacillus spp., with L. iners being the dominant species within the genus (45%, Fig. 3a, Table S11). However, as high as 70% of the cervicovaginal samples showed at least one Lactobacillus sp., although some with very low relative abundance (Table S11).

An independent analysis was performed among all ASVs classified as L. iners. From a total 90 L. iners ASVs, 43 were present in more than one woman. The three most prevalent L. iners ASVs were found in 66%, 5%, and 5% of women and the rest in < 2%. No association with urbanization, ethnicity and HPV infection were found for these three L. iners ASVs.

HPV infection

Microbiota diversity did not differ significantly between positive vs. negative HPV samples, high-risk vs. no-high-risk HPV types or among only-high, only low and both risk HPV types (for beta diversity p > 0.495, PERMANOVA; Fig. S4a; for alpha diversity p > 0.380, Kruskal–Wallis test; Fig. S4d, Table S12). Additionally, no alpha diversity associations were detected for any of the HPV types (p > 0.050, Fig. S4e). However, discriminant taxa analysis showed a significantly higher relative abundance of Coriobacteriaceae and Anaerococcus tetradius associated with lack of HPV infection (p < 0.05, LEfSe; Fig. S4b). The relative abundance of Prevotella amnii was significantly higher in women infected only with high-risk HPV types (Fig. S4c), while Prevotella, was positively associated with HPV16 presence (data not shown; ASVs classified up to genus Prevotella are other than those classified as Prevotella amnii). Finally, HPV infection or HPV types were not associated with CSTs, not even considering two groups of microbiota profiles: Lactobacillus-dominated or non-Lactobacillus dominated, for both vaginal (p > 0.050, Fisher’s exact test).

Discussion

Previous work has been done in the oral mucosa44,45, skin45,46, and gut45,47 microbiota of Amerindians and other populations, but this work is a pioneer in studying the vaginal microbiota in relation with lifestyle. With the current sample size and statistical power, this study shows little cervicovaginal microbial differences spanning urbanization (marginal to the non-significant tendency of increasing diversity with urbanization), with some taxa discriminating urban Amerindian from mestizos. The differences are consistent with differences in habits that might influence the vaginal microbiome (sex practices22,23, vaginal douching20,21, diet24,25, antibiotic use19, smoking26, etc.).

Important limitations of this work are the low sample size -due to difficult access to remote communities and small village sizes- and lack of genetic information to determine ethnicity -due to restrictions in the IRB permits-. Ethnicity may be an influential compositional factor, but genetic background and lifestyle are difficult to separate. There is an admixture level between Amerindians and mestizos. Based on an autosomal DNA genetic study, mestizos have a contribution of 67% European, 23% Amerindian, and 16% African48. Urban Amerindians included in this study may have none or lower admixture, since they have a short history of cohabitation with mestizos (~ 40 years), and inter-marriages are not common.

L. iners-dominated profiles, was highly prevalent in both vaginal sites in the women of this study. This profile is also prevalent in vaginal microbiota of in North American Hispanic, Asian and African-American women29. It is also found in women in Africa35,49,50 and in Hispanics from the Caribbean51. Other Lactobacillus-dominated vaginal profiles prevalent in other populations (such as > 15% L. crispatus, > 5% L. gasseri29,30,31), were rare in this study and did not form an independent cluster out of the hierarchical clustering. L. crispatus-dominant profile was reported common in Tokyo31. Cervicovaginal L. crispatus was not absent. Although it did not form an independent cluster, it was identified in 15% of the samples, and in 3.2% as the dominant taxon (> 50% of relative abundance). However, the differences between the CSTs observed in these populations merit further studies—using marker genes such as chaperonin-60 universal target, and comparing with other populations—to confirm that the healthy women harbor very low L. crispatus. No significant primer bias is expected as we amplified the V1-V3 hyper variable region of the 16S rRNA gene with primers including seven different 27F primer sequences to capture a broad spectrum of taxa, including L. crispatus52,53,54,55,56. We also found a relatively high prevalence of the low diversity G. vaginalis-dominated profile, previously reported as common among women32,57. The high prevalence of these two low diversity clusters is consistent with the low number of taxa described among this population (average of 9 taxa/woman). Additionally, the data showed two polybacterial clusters (high diversity CST) in the cervicovaginal site also described in other reports29,58. Small differences between vaginal sites were observed, with moderate congruence of CST-L.iners in both sites, and some discriminant taxa. Comparison between cervicovaginal and introital microbiota in previous studies showed similarities in some59 and differences in others53,60.

Certain bacterial profiles have implications on the temporal stability of the vaginal microbiota58. Ecologically, shifts from L. iners-dominated to non-Lactobacillus dominated profiles are more frequent than shifts to any other Lactobacillus-dominated profile57,61,62. Genomic and metabolic characteristics suggest that L. iners is better adapted to environmental changes, such as hormonal levels (reviewed in6). L. iners dominance might constitute an advantage for maintaining a Lactobacillus-dominated profile in women with frequent hormonal changes due to high pregnancy rates (mean of 4.4 pregnancy/woman in this study). If genetic factors are relevant, we could expect similarities between Amerindians with Asians, given the Asian ancestry, and indeed, other studies have shown high prevalence of L. iners-dominated vaginal profiles in Asians living in USA29 and South-Asian Surinamese32 women.

The low urbanization group had the highest prevalence of cervicovaginal CST-L. iners in relation to other CSTs. Interestingly, CST-L. iners and CST-G. vaginalis were positively associated with a surrogated marker of a traditional lifestyle, such as crop gardening, involving factors that may affect the microbiota, such as physical activity (reviewed in63) and frequent contact with rich-bacterial environments (reviewed in64).

Ethnicity comparisons were performed between high urban level Amerindians and mestizo women living in the same location. Ethnic groups may reflect both genetics and lifestyles; for example, even living in urban conditions, traditional practices may be culturally preserved (urban Amerindian women still maintain traditional crops, tend to have only vaginal intercourse with Amerindian men, and do not adopt practices such as vaginal douches, used by 42% of mestizo women). There were few discriminant species between Amerindian and mestizo women. Amerindians had high prevalence of Brevibacterium linens, an aerobic bacterium rarely reported in the human vagina, rather common in cheese (reviewed in65), and associated with preterm birth66. Amerindians also had high Peptoniphilus lacrimalis, related to persistent BV67. Mestizo had an overrepresentation of vaginal Mobiluncus mulieris and Prevotella, both typically associated with high diversity profiles29, linked to BV68,69,70,71. Both taxa have been previously correlated with high genital pro-inflammatory cytokine levels (IL-1α, IL-1β, TNF-α) reflecting an elicited immune response62. Mobiluncus mulieris seems to facilitate G. vaginalis and other BV-associated taxa growth (reviewed in72).

The high prevalence of HPV (77%) found in this population73 may be associated with the general lack of L. crispatus6,14. High HPV prevalence has been associated with high L. iners and G. vaginalis17 or with non-Lactobacillus profiles18,74, as opposed to L. crispatus6,14. High-risk HPV types are known to elicit inflammatory responses (IL-1α, IL-1β, and IL-8) from cervical epithelial cells62. In this study, HPV-16 was associated with Prevotella amnii, an association also reported previously13. The relationship between HPV and the vaginal microbiota might be important since high diversity has been associated with HPV infection and with cervical inflammation18,74, mucin-degrading activity (sialidases) and cytolysin (vaginolysin) that damage the epithelium75,76.

In summary, Amerindians and mestizo show vaginal profiles dominated by L. iners, by G. vaginalis, and high diversity profiles, with consistency between vaginal sites. A few taxa discriminated between urban Amerindian and mestizos and a trend to increase diversity with urbanization in Amerindians. Additional inquiries into host genetic information could improve our understanding of the factors underlying these changes and their significance for women’s health.

Materials and methods

Study design

This study is part of a larger project aimed at determining the effect of urbanization on the microbiota of Amerindians, of which we have published the prevalence and types vaginal HPVs73. Amazonian women including Amerindians and mestizo of the northern area of the Amazonas State, Venezuela, were invited to participate in the study. Protocols and informed consent were approved by SA Centro Amazónico de Investigación y Control de Enfermedades Tropicales Simón Bolívar, Venezuela (SACAICET, IRB #78-2014), and University of Puerto Rico (IRB #1314-163).

Amerindian women were from the ethnic group Piaroa (women with both parents Piaroa who were self-identified as such and speak Piaroa language). After the permission by the village Captain, each woman gave her informed consent for study participation (signing or stamping their fingerprint), for participants under 18 years old informed consent from a parent and/or legal guardian was obtained. Sexually active women, ranging from 12 to 53 years old and compliant with inclusion criteria were enrolled. Inclusion criteria included women of reproductive age who at the time of recruitment had none of the following: pregnancy, menses, bleeding in the last 24 h, sexually transmitted infection diagnosed in the last 2 months (for those who had medical access), antibiotics in the last month, vaginal douches in the last 24 h, sexual intercourse in the last 24 h, hysterectomy, diabetes, urinary incontinence, urinary tract infections, and HIV. Women were excluded (n = 117) mostly due to recent exposure to antibiotics or antiparasitic drugs (28%), menses (25%), post-menopausal (13%), pregnant (12%), urinary infections (8%), refusing to participate (4%), sexual contact in the last 24 h (3%), hysterectomy (2%), belonging to a different ethnicity (1%), diabetes (1%), and HIV (1%). One HIV positive patient was sampled, although she was removed from the study after HIV diagnosiss (she was already in treatment although she denied she was infected during the inclusion criteria questions). Those women who did not meet the inclusion criteria also received medical service. No information about the filial relationship among volunteers was recorded. However, a relative level of endogamy is expected, and it may to be particularly high in smaller Amerindian communities.

Eight communities (Autana municipality in Amazonas state) and one urban town (Puerto Ayacucho, Amazonas state’s capital) formed a gradient of urbanization and were chosen for the study (described previously73). Some communities were located deep in the rainforest, a 2 day walk from the closest rural medical post, hence with scarce contact with urban population. These remote communities have a medical visit once every 1–2 years for children compulsory vaccination (HPV vaccine has not been included in the public health program in Venezuela up to date), have no running water or electrical services, and subsist on fishing, hunting and agriculture practices. Other communities show intermediate exposure and others that would usually live in an urban town with health and other public services available and access to industrialized products (Fig. 1).

Clinical information (STI history), lifestyle habits (number of sexual partners, diet) including typical urban practices (use of vaginal douches, contraception, and antibiotic use) were recorded from the recruited volunteers. Due to the social taboo around sexual practices, no information about simultaneous coupling or same-sex couples was collected, however, the Piaroa culture defines itself as being monogamous. Women were classified in three levels, namely low, medium or high urbanization level based on their individual level of adoption or urban lifestyles of abandon of traditional habits as described previously (Urbanization survey73; Fig. 1). Individual-level of urbanization was selected rather than community-based urban level for a higher classification resolution73.

Samples and sample processing

An obstetrics-gynecologist associated with the study team sampled two vaginal sites, the introitus and the endo/ectocervix and posterior fornix (referred here as cervicovaginal area), which was sampled last. Samples were obtained by rotating the area with a sterile cotton swab. Cervicovaginal samples were taken with the use of a disposable speculum with sterile saline solution for lubrication. Since samples were collected under various conditions, eight environmental controls were taken for each location visited exposing a swab for 20 s to the air. Swabs were stored up to 4 h on ice in sterile and empty tubes, followed by storage in liquid nitrogen. Papanicolaou smears were also taken. Smear results described for endo/ectocervical abnormalities followed the Bethesda 2001 classification system, and reported the presence of clue cells, consistent with diagnostic of BV77. Vaginal pH was measured by pressing an introital swab on a piece of pH paper and reading the color scale (range: 4–10, ColorPhast).

A drop of blood was taken from finger prick for in situ hemoglobin (using Easylife rapid test in peripheral blood)78, and serum tests were performed for HIV, syphilis, and hepatitis B and C at the Public Health Center of Puerto Ayacucho. No specific assay for C. trachomatis was performed. HIV and hepatitis positive patients were excluded from the study. Fecal samples collected by each woman were preserved in tubes with iodine-formaldehyde for microscopy analysis of intestinal parasites.

DNA was extracted from swabs and feces using MoBio (CA, USA) PowerSoil according to the instructions provided by the manufacturer. Extracted DNA from samples was stored at − 20 °C until sequencing. The V1-V3 region of 16S rRNA gene was amplified by PCR. Controls for reagent and DNA extractions were also amplified and sequenced. Amplification were ran in two rounds of PCR with dual barcode indexing using the primers 27F and 534R as described elsewhere54 although here a HotStar HiFidelity Polymerase (Qiagen) was used. Sequencing was performed on an Illumina MiSeq Instrument at the University of Idaho in 2017. Additionally, extracted DNA was used for determining cervicovaginal HPV status which has been previously studied for this same population73. The method used was reverse hybridization method SPF10-PCR- LiPA25 system, version 1 (Labo Biomedical Products, Rijswijk, The Netherlands, based on licensed Innogenetics technology), which allows to detect 25 of the most common mucosa HPV types (types 6, 11, 16, 18, 31, 33, 34, 35, 39, 40, 42, 43, 44, 45, 51, 52, 53, 54, 56, 58, 59, 66, 68/73, 70, and 74).

Bioinformatic analyses

Paired-end sequences were merged using FLASH79 yielding a 460 bp fragment. Sequences were filtered for quality and Amplicon Sequence Variants (ASVs) were identified using DADA280. ASVs classification to the species level was performed using SPecies level Identification of metaGenOmic amplicons (SPINGO81), with default settings. SPINGO is a software dedicated to high-resolution assignment of sequences to species level using partial 16S rRNA gene sequences. It uses a reference database built with full-length (≥ 1,200 bp) bacterial and archaeal 16S rRNA gene sequences obtained from the Ribosomal Database Project version 11.282 and sequences are labeled to species names according to the NCBI taxonomy (https://www.ncbi.nlm.nih.gov/guide/taxonomy/). In this study, SPINGO classifications were accepted or rejected based on their similarity scores above 0.5 for species-level classifications. In cases where the similarity scores were below 0.5 for the SPINGO species and genus assignments, the RDP naive Bayesian classifier method and the Silva reference database were used to assign the ASV to a family or higher taxonomic rank. Eighteen ASVs were not classified and were searched against NCBI database nucleotide collection (nt) using megaBLAST83 and were labeled as their closest hit, ≥ 97% identity (e.g. Uncultured_bacterium_5669ncd431b01c1). Sequence counts per ASV that were assigned to the same taxa were combined to obtain the final abundance table. Sequences assigned to Shuttleworthia sp., were aligned manually against the NCBI database nucleotide collection (nr/nt) using megaBLAST and reassigned to BVBA-1, as previously described84.

ASVs abundance table was rarified at 1655 reads per samples. Bacterial sequences were analyzed using the QIIME 1.9.1 platform85 and R v.3.3.2 packages86. Urbanization group comparison was performed only among Amerindian women (low, n = 15–19; medium, n = 20–22; and high, n = 20–22 urbanization groups, N are for cervicovaginal and introital samples respectively), controlling for the location variable (women village). To make possible controlling for location, nearby communities (< 10 km of separation) were combined, for a total of 4 location areas. For ethnicities comparison urbanization and village variables were fixed, only including women from the high urbanization group and living in nearby villages (Puerto Ayacucho and Alto Carinagua, at 10 km or 20 min. from each other in public transportation; Amerindians, N = 15–17 and mestizo, N = 27–24; for cervicovaginal and introital samples respectively).

Beta diversity analysis was performed with Bray–Curtis dissimilarities. Adonis2 function from the vegan87 R package, was used to execute non-parametric Permutational Multivariate Analysis of Variance (PERMANOVA88) to compared variance between groups to the variance within groups (spatial location differences). Principal Coordinates Analysis (PCoA) plots were generated with vegdist, betadisper, and plot functions in R drawing one standard deviation ellipses by group. Alpha diversity was performed with Shannon and Simpson index fitting linear mixed-effect models for urbanization analyses (LMM) or with Kruskal–Wallis test for ethnicity comparisons. Linear Mixed-Effect Model (LMM, controlling for woman location) was performed using lme4 package89 with log10 transformed data to reach normality in residues.

Bacterial cervicovaginal and introital community state types (CSTs) were generated by hierarchical clustering as described previously for vaginal samples58. Briefly, squared root of Jensen-Shannon divergence measure matrices from each woman including proportions of ASVs were calculated using the textmineR R package90. Hierarchical clustering was performed with hclust function and ward.D method. The number of clusters were defined using clValid package with local editions to allow the Jensen-Shannon dissimilarity (JSD) metric. The clvalid function was set to explore between 2 and 10 possible clusters with hierarchical clustering method, Ward´s agglomeration method and internal validation with Silhouette index (SI) ranging from − 1 (lowest confidence) to 1 (highest confidence)91. Additionally, prediction strength (PS)92 was measured with Philentropy R package93 using the same JSD matrix and hierarchical clustering. Cervicovaginal site yielded four clusters with a maximum SI of 0.378 and with a PS of 60% and for introital two clusters with a SI of 0.466 samples with the PS of 77%. However, two clusters were suggested SP when applying a criterion of ≥ 80% for cervicovaginal and no cluster for introital samples. SI validation method was chosen for the final number of clustering.

Sampling coverage was estimated using iNext94 R package, for datatype = "abundance", nboot = 999. A Venn diagram was build using an online tool (https://bioinformatics.psb.ugent.be/webtools/Venn/). Cohen’s kappa coefficient95 and percentage of coincidence among body site CSTs were calculated with GraphPad QuickCalcs Web (https://www.graphpad.com/quickcalcs/kappa2/) and was interpreted as: < 0 less than chance; 0.01–0.20 slight; 0.21– 0.40 fair; 0.41–0.60 moderate; 0.61–0.80 substantial; and 0.81–0.99 almost perfect agreement96. Discriminant taxa analyses were performed with LEfSe97. LEfSe analysis was performed, only for CSTs comparison, with one-against-all strategy.

To explore L. iners variants (16S rRNA gene amplicons sequences variants, ASVs, classified within L. iners) and possible associations with the study’s variables, three of the most prevalent L. iners ASVs were compared with urbanization, ethnicity and HPV infection using fisher’s exact test.

Statistical analyses for CSTs within urbanization groups included log-linear model with R base functions, Fisher’s exact test and pairwise Fisher's exact test, performed with fmsb R package98 for categorical variables. Lifestyle variable analyses were performed with ANOVA, TukeyHSD, Kruskal–Wallis and pairwise Kruskal–Wallis tests for continuous variables. Adjustment for multiple comparisons was performed using the Benjamini-Hochberg’s false discovery rate (FDR-BH) method99. Statistics and graphics were also performed using the reshape2100, ggplot2100, and gplots101 R packages. For body site comparisons, QIIME was used to build taxa bar plots and to visualize UniFrac distance matrices in PCoA plots through EMPeror102. QIIME was also used for taxa bar visualization.