Endocervical and vaginal microbiota in South African adolescents with asymptomatic Chlamydia trachomatis infection

Adolescent girls and young women represent a key risk group for sexually transmitted infections (STIs). The vaginal microbiota is thought to play an important role in susceptibility to STIs such as Chlamydia trachomatis. We compared the microbiota of the lateral vaginal wall and endocervix, and assessed associations with C. trachomatis infection in South African adolescents. The endocervical and vaginal lateral wall microbiota were characterized by amplifying and sequencing the V4 region of the 16S rRNA gene and C. trachomatis diagnosed using molecular methods. Of the 72 girls included, 30 had asymptomatic C. trachomatis infections. Three major vaginal community types were identified; one Lactobacillus crispatus, one L. iners and one diverse, Gardnerella vaginalis dominant. The microbiota of the endocervix was significantly different from that of the lateral wall in terms of diversity. There were many differentially abundant taxa between the endocervix and lateral vaginal wall, including Achromobacter spanius and Enterococcus faecium. Women with C. trachomatis had higher relative abundance of G. vaginalis and other anaerobes. In this African adolescent cohort, significant differences between the lateral vaginal wall and endocervical microbiota diversity and composition were evident, although neither were strongly associated with C. trachomatis infection.

of common vaginal bacterial community types, which vary according to population 2,3 . In general, low-diversity communities dominated by a single Lactobacillus species (Lactobacillus crispatus, L. gasseri, L. iners, or L. jensenii) and one to two high-diversity communities, comprised of a mixture of anaerobic bacteria associated with BV, are described [3][4][5][6] . Lactobacillus species are thought to protect the lower genital tract from invading pathogenic microorganisms by competing for nutrients, secretion of bacteriostatic and microbicidal compounds and by lowering the pH (<4.5) through lactic acid production.
Several studies have demonstrated a positive association between BV and the incidence of STIs, including C. trachomatis, Neisseria gonorrhoeae and Human Immunodeficiency Virus (HIV) [7][8][9][10] . Recent molecular studies have shown that women with C. trachomatis infection are more likely to have L. iners-dominated or dysbiotic vaginal communities compared to those without [11][12][13] . Although less well studied, the composition of the endocervical microbiota may be of greater importance when looking at the relationship between commensal bacteria and STIs, as the cervix is the main site of infection for several pathogens including N. gonorrhoeae and C. trachomatis 14 .
Whereas the vagina has stratified squamous epithelium, C. trachomatis infects the columnar epithelial cells of the cervix. The cervical transformation zone is enriched with T cells and antigen-presenting cells compared to the vagina 15 and differing immune and physiological environments (including different oxygen levels) could impact the microbial composition in different regions of the lower FGT. Molecular studies characterizing bacterial communities at different anatomical sites throughout the FGT have generated varying results with regards to microbial homogeneity [16][17][18] . Distinct microbial communities in cervix differing from that of the vagina have been described 19 . It is therefore of great importance to characterize endocervical microbial communities when studying the role that commensal bacteria play in C. trachomatis infection. To our knowledge, few molecular studies have evaluated the relationship between prevalent C. trachomatis infections and the composition of cervicovaginal microbiota and none have focused on African adolescents, a group at extreme risk for STIs. Therefore, we compared the endocervical and vaginal microbiota and evaluated their association with C. trachomatis infection in young South African females from a high risk community.

Results
Study population characteristics. Of the 154 lateral vaginal wall and endocervical samples taken from 77 participants, 149 samples passed the sequencing and quality control (≥5000 reads/sample, 3 endocervical and 2 lateral wall samples failed). Downstream analyses were conducted on 72 participants for whom both the lateral wall and matched endocervical samples passed quality control measures. Of those 72 participants, 42 (58%) were C. trachomatis negative and 30 (42%) C. trachomatis positive. None of the 30 women infected with C. trachomatis had LGV serovar infections. Nugent scores were available for all 72 participants. Nearly half of the participants (49%, n = 35) were BV positive (Nugent score 7-10), 5% were BV intermediate (Nugent score 4-6, n = 4) and the remaining 46% were BV negative (Nugent score 0-3, n = 33) ( Table 1). The C. trachomatis positive and negative participants were similar in terms of most demographic and behavioral characteristics including age, body mass index (BMI), intravaginal practices and hormonal contraceptive use. Participants with BV were almost twice as likely to be infected with C. trachomatis compared to BV negative participants, but these observations were not statistically significant (OR = 1.8; 95% CI: 0.71-4.98, p = 0.21).
Adolescents infected with C. trachomatis were less likely to report multiple sexual partners and regular condom use (over the past year) and reported fewer lifetime sexual partners than C. trachomatis negative participants; although not significantly. C. trachomatis positive participants were, however, significantly more likely to be co-infected with N. gonorrhea than C. trachomatis negative participants (23% versus 0.5%, p = 0.03, Table 1).
To evaluate the differences in overall bacterial community composition between sampling sites, principal coordinates analysis (PCoA) using weighted UniFrac distances was performed (Fig. 1b). The samples clearly grouped into three distinct clusters corresponding to the three community types described above (Fig. 1b). However, within the three clusters the samples separated according to sampling site (Fig. 2a). (Adonis p = 0.002). Furthermore, lateral wall samples had significantly higher species richness and alpha diversity compared to endocervical samples (p < 0.001, Fig. 2b). Yet, for 82% of the participants, the lateral wall and endocervical samples were assigned to the same community cluster and the within-participant variability was significantly less than the variability of samples from same site between participants based on mean weighted Unifrac distances (p < 0.001 for endocervical and lateral wall, Student's paired t-test, Fig. 2e).
In order to identify differentially abundant bacterial species by anatomical sites, metagenomeSeq analysis was applied. The relative abundances of 31 taxa (OTUs merged at lowest taxonomic level) were significantly different between endocervical and lateral wall samples (Fig. 2c) was additionally used to identify species predictive of sampling site. In concordance with the metagenomeSeq results, the most influencial taxa in differentiating endocervical from lateral wall included A. spanius_cluster, E. faecium, M. aerolatum, Peptoniphilus harei_asaccharolyticus_grossensis and Megasphaera which were more prominent in endocervical samples, and Anaerococcus, F. magna, Dialister and Corynebacterium ssp., which were more common in lateral wall samples (Fig. 2d, AUC = 0.98, PPV = 0.96, NPV = 0.90 for the test set and a validation predicted error rate of 6.25%).

Relationship between genital tract microbiota with C. trachomatis infection. The association
between C. trachomatis infection and microbiota was considered separately for lateral wall and endocervical samples. The alpha diversity of endocervical microbiota tended to be higher in C. trachomatis positive participants compared to C. trachomatis negative participants, although not significantly (1.8 versus 1.1, p = 0.28, Table 1); this was also the case in lateral wall samples (p = 0.32, data not shown). In univariate analysis, participants with a community type dominated by diverse anaerobic bacteria (C1) or L. iners (C3) were more likely to be infected with C. trachomatis compared to those having an L. crispatus-dominated community, although not significantly so (OR = 2.98; 95% confidence interval: 0.76-15.0 and OR = 2.50; 95% CI: 0.56-13.6, respectively). None of the study participant demographic or behavioral factors listed in Table 1 were associated with C. trachomatis in univariate or multivariate analyses.
On the other hand, differential abundance testing (using metagenomeSeq) revealed many significantly differentially abundant cervical taxa between C. trachomatis positive versus negative women, including as expected higher relative abundance of N. gonorrhea, but also Rs-045, Sutterella sanguinus_morbirenis, Porphyromonas somerae and Corynebacterium urealyticum (Fig. 3b). Using random forest analysis, these same species, as well as G. vaginalis, Aerococcus christensenii, Dialister spp., Megasphaera, A. vaginae and Prevotella disiens were found to be predictive of C. trachomatis infection (Fig. 3b). However, the predictive model had an AUC of 0.46 and predicted error rate of 54.17%. Many of these same genera were also predictive of C. trachomatis infection status in lateral wall samples, with Prevotella spp. and Megasphaera being the strongest predictors (Supplementary Figure 1); however this predictive model was very weak with an predicted error rate of 66.67% and AUC of 0.29 for the test set. In concordance with this, no overall differences in microbiota composition (β-diversity) were found between C. trachomatis positive and C. trachomatis negative participants (Fig. 3c, Adonis p = 0.27 and p = 0.58 for endocervical and lateral wall, respectively). Here, we find significant differences between endocervical and vaginal microbiota, and identify associations between cervical bacterial taxa and C. trachomatis infection in African adolescents. Although the overall bacterial community compositions were found to be more similar between the endocervix and vagina within each participant than between participants, we found significantly different β-diversity between endocervical and vaginal samples within clusters. Furthermore, consistent with the results of others 21 we found the within sample diversity in the vagina to be higher than in the ectocervix. Although previous studies have suggested that the microbiota in different anatomical sites within the FGT of adults is similar in composition [16][17][18] , we identified multiple taxa that were significantly more abundant in the vagina. These included anaerobes, such as Peptoniphilus, Anaerococcus and Fusobacterium spp. The endocervix also had significantly more E. faecium, the source of which is likely the gastrointestinal tract. These findings are surprising, since the endocervix is further from the external (aerobic) environment and the rectum, than the vagina. In Chinese women 19 , L. crispatus was notably more abundant in the vagina compared to the endocervix, which may be due to epithelial cell characteristics. L. crispatus expresses an adhesion molecule which mediates adherence to stratified squamous epithelium (found in the vagina) but not to columnar epithelium which constitutes the cervical epithelium 22 . Therefore, factors other than geography likely play a role in microbial ecosystems in the FGT. Since adolescence is a time of change in the cervix, it is possible that the differences we see here are unique to this age group. Alternatively, differences in sample collection technique between lateral vaginal wall and endocervical swabs may have occurred.
No studies have explored the relationship between specific taxa in the endocervix and C. trachomatis in adolescents. Our findings suggest the endocervical microbiota diversity is not grossly altered in women with C. trachomatis, although the study was small. However, we found some taxa differentially abundant between C. trachomatis infected and uninfected women, including G. vaginalis, Megasphaera, A. vaginae, Dialister ssp., Prevotella spp. and Megasphaera, all BV-associated bacteria. Several studies have reported an increased risk of C. trachomatis infection among women with BV 2,5,23 . In a longitudinal study among Dutch women, microbiota dominated by L. iners was an independent risk factor for later C. trachomatis acquisition 12 , and L. crispatus may have been protective [11][12][13][14] . The correlation between lactobacilli and vaginal health is often attributed to the production of lactic acid resulting in low vaginal pH. Importantly, L. crispatus produces significantly more D-lactate than L. iners, and this difference has been shown to be important in trapping of pathogens 24 . Indeed, L. crispatus has been shown to inhibit growth, adhesion and infectivity of C. trachomatis in vitro, attributed to the production of D-lactate [25][26][27] . Of note, L. iners is far more prevalent than other lactobacilli in African women, often co-exists with BV 9,20 , and may be one of the reasons why women of African descent are at high risk for STIs. One limitation of this study is its cross-sectional nature, which precludes definitive conclusions regarding whether specific taxa directly increase risk of C. trachomatis infection. Prospective longitudinal cohort studies will allow comparison of microbial communities prior to and after infection with C. trachomatis, which will allow us to make conclusions regarding the role of microbiota in modifying C. trachomatis risk in adolescents. None-the-less, the potential for altered functional profiles of C. trachomatis susceptible bacterial communities between different anatomical sites exists, despite overall compositional similarities between women with and without C. trachomatis infections.

Methods
Study participants and sample collection. The study population consisted of 149 16-22-year-old women from a low income, high population density community in Cape Town, South Africa. Participants were enrolled in the Women's Initiative in Sexual Health (WISH) study between November 2013 and December 2014. Detailed procedures and characteristics of the cohort have previously been described 28 . Approval for the study was obtained from the Human Research Ethics Committee of the University of Cape Town and all study procedures were conducted in accordance with the International Conference on Harmonization (ICH) and the South African Good Clinical Practice Guidelines 29 . All participants 18 years or older provided informed consent, while informed parental consent and participant assent were obtained for those younger than 18 years. Females were enrolled if they were HIV-negative, in good health, not pregnant or menstruating at the time of sampling, if they had not had unprotected sex or douched in the last 48 hours, nor taken antibiotics in the prior two weeks. The women were followed longitudinally every two months if they were using Norethisterone enanthate (NET-EN) injectable contraceptive, combined oral contraceptives or barrier contraception only, or every three months if they were using depot medroxyprogesterone acetate (DMPA) injectable contraceptive, for a total of three visits. Study visits were scheduled two weeks after injection for participants on injectable contraceptives, or otherwise during the luteal phase of their menstrual cycles (between day 14-28) if they were not using hormonal contraceptives or if they were using oral contraceptives. Prior to any specimen collection, an HIV rapid test (Alere Determine ™ HIV-1/2 Ag/Ab Combo, Alere, Waltham, MA), a pregnancy test (U-test Pregnancy strip, Humor Diagnostica, Pretoria, South Africa) and a general physical examination was performed. At each visit, vulvovaginal swabs for STI testing and Nugent scoring, as well as swabs from the lateral vaginal wall and endocervix for microbiome analyses, were obtained. STI and BV testing. Vulvovaginal swabs were tested with a variety of in-house multiplex and commercial PCR assays (Chlamydia trachomatis, Neisseria gonorrhoeae, Trichomonas vaginalis, Mycoplasma genitalium, herpes simplex virus type 1 (HSV-1) and HSV-2, Haemophilus ducreyi, Treponema pallidum, for positive C. trachomatis DNA extracts, additional testing for Lymphogranuloma venereum-associated serovars) as previously described 21 . Chlamydial infections were further differentiated into lymphogranuloma venereum (LGV)-associated serovars or non-LGV serovars, using an LGV-specific PCR. Blood was obtained for HIV rapid testing and HSV-2 serology. Endocervical swabs were collected for human papillomavirus (HPV) detection and genotyping by Roche Linear Array 30 . Posterior fornix swabs were collected for Nugent scoring to classify samples as BV negative (Nugent 0-3), intermediate  or positive (Nugent 7-10); and vaginal pH was measured using colour-fixed indicator strips (Macherey-Nagel, Düren, Germany).

16S rRNA gene amplification and Illumina MiSeq sequencing. Vaginal lateral wall and endocervical
swabs were collected for microbiome analysis by 16S rRNA gene sequencing. Matched lateral wall and endocervical swabs from the first sample visit were thawed and treated with an enzyme cocktail consisting of mutanolysin (25kU/ml, Sigma Aldrich, Modderfontein, RSA), lysozyme (450kU/ml, Sigma Aldrich), and lysostaphin (4kU, Sigma Aldrich) for 1 hour at 37 °C. This was followed by mechanical disruption using the Thermo Savant FastPrep 120 Cell Disrupter system for 3 × 30 seconds at speed setting 5.5 m/s. Microbial DNA was extracted using the PowerSoil DNA Isolation kit (Mo Bio Laboratories Inc., Germantown, MD, USA) following the manufacturer's protocol. The V4 hypervariable region of the bacterial 16 S rRNA gene was amplified using modified universal primers 31 . Pooled duplicate samples were purified using Agencourt AMPure XP beads (Beckman Coulter, Brea, CA, United States) and quantified using the Qubit dsDNA HS Assay (Life Technologies, Carlsbad, CA, USA). Illumina sequencing adapters and dual-index barcodes were added to the purified amplicon products using limited cycle PCR and the Nextera XT Index Kit (Illumina, San Diego, CA, USA). Amplicons from 96 samples and controls were pooled in equimolar amounts and the resultant libraries purified by gel extraction (Qiagen, Hilden, Germany) and quantified. The libraries were sequenced on an Illumina MiSeq platform (300 bp paired-end) with v3 chemistry.
16S rRNA gene sequencing analysis. Following demultiplexing, raw reads were preprocessed as follows: forward and reverse reads were merged using usearch7 32 , allowing a maximum of three mismatches and the resultant merged reads were quality filtered using usearch7 (reads with E scores larger than 0.1 were discarded). Next, primer sequences were removed using a custom python script and the reads truncated at 250 bp. Sequences were then de-replicated whilst recording the level of replication for each sequence using usearch7. De-replicated sequences were sorted by abundance (highest to lowest) and clustered de novo into operational taxonomic units (OTUs) at 97% similarity using usearch7. Chimeric sequences were detected (against the Gold database) using UCHIME 33 and removed. Individual sequences were assigned to the specific identifiers using a 97% similarity threshold. Taxonomic assignment was performed in QIIME 1.8.0 34 using the RDP classifier (using the default confidence level of 0.5) against the GreenGenes 13.8 reference taxonomy for 97% identity. To increase species-level resolution, we used the usearch_global command implemented in VSEARCH 35 to search the de novo picked OTUs' representative sequences against a Custom Vaginal 16 S Reference Database (described previously 20 ). All hits with ≥97% identity were accepted. The remaining OTUs (n = 109) were manually curated using both BLAST on NCBI's nucleotide database (excluding uncultured organisms) and with usearch_global against the Vaginal 16S rDNA Reference Database 36 . Samples with ≥5000 reads were selected for downstream analyses. The OTU table was normalized (i.e. transformed to relative abundance * median sample read depth), and filtered to include OTUs with at least 10 counts in at least 20% of samples or a relative abundance of at least 0.001%.

Statistical analyses.
All downstream statistical analyses were performed in RStudio 37 using the packages phyloseq 38 for beta diversity analyses, metagenomeSeq 39 for differential abundance testing, vegan 40 for ordinations and redundancy analysis, and pheatmap 41 and NMF 42 for annotated heat maps. Microbiota subtypes were established by fuzzy clustering using the R package 'cluster' 43 with k = 3 (optimal k), a membership exponent of 1.25 and weighted Unifrac as the dissimilarity measure. Members with a probability of less than 60% of belonging to any of the three clusters were excluded from downstream analyses (n = 4).
SCIENTIFIC REPORTS | (2018) 8:11109 | DOI:10.1038/s41598-018-29320-x Differences in study population characteristics were tested using the Chi-square, Fisher's exact (when the expected value was < 5) or Mann-Whitney U test, accordingly. To examine whether the community type (C1-3) or any demographic or behavioral characteristic were associated with C. trachomatis infection we performed univariate and multivariate logistic regression analyses. Variation partitioning to investigate factors influencing microbial composition was performed by analysis of variance using distance matrices (adonis) with 999 permutations using weighted UniFrac distance measures. Specific differences in microbial composition between groups were assessed using metagenomeSeq's MRfulltable function with a custom filter to determine significance. Merged taxa were deemed significantly different if they exhibited a fold change (beta coefficient) of ≥1.25, had an adjusted p-value of ≤0.05 and if at least one of the two groups being compared had ≥20% of samples with the given OTU/taxa OR the Fisher's exact test result was significant (after multiple testing correction (MTC) by Benjamini-Hochberg method. OTUs were first merged at the lowest available taxonomic level using a custom script 20 . Random forests analyses were conducted on merged OTUs using the R package randomForest 44 . For randomForest, the data were randomly divided into training and test sets, comprising two thirds and one third of the data, respectively. Ecologic distances between lateral wall and endocervical samples from the same participants were compared with distances between samples from the same site from different particpants by comparing within-subject pair-wise distances (weighted Unifrac) with the mean of between-subject pair-wise distances using paired t-tests.