Diversity of vaginal microbiome and metabolome during genital infections

We characterized the vaginal ecosystem during common infections of the female genital tract, as vulvovaginal candidiasis (VVC, n = 18) and Chlamydia trachomatis infection (CT, n = 20), recruiting healthy (HC, n = 21) and bacterial vaginosis-affected (BV, n = 20) women as references of eubiosis and dysbiosis. The profiles of the vaginal microbiome and metabolome were studied in 79 reproductive-aged women, by means of next generation sequencing and proton based-nuclear magnetic resonance spectroscopy. Lactobacillus genus was profoundly depleted in all the genital infections herein considered, and species-level analysis revealed that healthy vaginal microbiome was dominated by L. crispatus. In the shift from HC to CT, VVC, and BV, L. crispatus was progressively replaced by L. iners. CT infection and VVC, as well as BV condition, were mainly characterised by anaerobe genera, e.g. Gardnerella, Prevotella, Megasphaera, Roseburia and Atopobium. The changes in the bacterial communities occurring during the genital infections resulted in significant alterations in the vaginal metabolites composition, being the decrease of lactate a common marker of all the pathological conditions. In conclusion, according to the taxonomic and metabolomics analysis, we found that each of the four conditions is characterized by a peculiar vaginal microbiome/metabolome fingerprint.


Species-level analysis for Lactobacillus genus
Classification of reads belonging to Lactobacillus, the most important bacterial genus colonizing the vaginal enrironment, was further improved, where possible, down to the species level, via a BLASTbased [S1] re-classification on an ad-hoc built reference database. Due to the high similarity among Lactobacillus species in V3-V4 region of 16S rRNA, especially between L. crispatus (a well-known member of vaginal flora) and L. gallinarum ([S2], isolated in chicken crop and feces), we did not consider the whole set of Lactobacillus references available from NCBI RefSeq database for bacteria (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/), which comprised, as of 2018, a total of 518 strains.

Reference sequences
Genome sequences used for the classification comprised a total of 17 species among the most frequently found within the vaginal environment, for a total of 282 strains. Through a custom script, sequenced genomes for all species and strains were downloaded and properly formatted for further processing. In all our analyses, only bacterial strains with a genome finishing grade of "Complete", "Chromosome" or "Scaffolds" were considered. The following table summarizes the references used for each species. Reads to re-classify From the OTU table comprising all the samples, OTUs classified within the Lactobacillus genus were selected and the sequences of all the reads grouped in each OTU (clustered at 97% similarity) were retrieved. In order to reduce the number of sequences to re-classify, clonal reads (i.e.: reads being identical throughout 100% of their length and composition) were grouped together.

Classification
Re-classification of the reads was performed through nucleotide BLAST (legacy BLAST, v 2.26), using a cutoff of 1e-10 for the e-value and de-activating the dust-filter. Only reads matching for at least of 80% of their length were retained and, for each read, the best match (i.e.: that or those with the higher bit-score) was selected. If a read had multiple classifications on different species, the classification was reset to genus level.

Statistical analysis
In order to keep only consistent data for species-level evaluations, only samples having a relative abundance of Lactobacillus genus higher than 1%, were considered. This was made to exclude samples with very few reads classified in the genus that could profoundly alter the dataset (e.g.: considering a sample in which we had only 1 read in a genus, this would have brought a 100% to the species-level classification for that certain species). Since the least sequenced sample had about 12000 reads, this equaled having at least 120 reads in Lactobacillus genus; 75 out of the 79 samples were, thus, considered. As expected, all the 4 excluded samples were from women affected by bacterial vaginosis (BV), which is characterized by a dramatic decrease of Lactobacillus species and a consequent proliferation of other micro-organisms. A non-parametric Mann-Whitney U-test was used to compare the relative abundance of each bacterial species in the different experimental groups, considering pvalues <0.05 as significant. Statistical evaluations were carried out using Matlab (v 2008b, Natick, MA, USA)

Co-abundance network analysis
Bacterial genera were selected considering only those present at >0.5% of abundance in at least 30% of the samples in at least one experimental group, in order to exclude minor and transient contributors of the gut microbiota. This resulted in a subset of 24 genera using the whole dataset, having a relative across all samples ranging from 55.80% to 0.17%. All statistical evaluations and heatmaps were carried out using Matlab and the Fathom Toolbox [S3] and visualized by Cytoscape (v 3.0, [S4]) The co-abundance between each pair of genera was evaluated calculating the Spearman's correlation coefficient and displayed as heatmaps, hierarchically clustered using Euclidean correlation metric and average linkage. Only associations having a Benjamini-Hochberg adjusted p-value <0.05 for the linear model were used to build for the hierarchical clustering. Results obtained considering the entire dataset of samples were used to define the co-abundance groups (CAGs). Permutational multivariate analysis of variance (P-MANOVA, [S5]) was used to determine whether CAGs were significantly different from each other. Essentially this compared strength of the correlations between the groups to correlation strengths within the groups in a pairwise manner. All comparisons, performed via 9999 random permutations, except for comparison between CAGs 2 vs. 3 and 2 vs. 4, had a p-value<0.05 for rejecting the hyphothesis of no-difference among groups. The whole PERMANOVA analysis resulted highly significant (p<0.001). Co-abundance network plots were created as previously described ( [S6]). In the plots, circle and label size is proportional to the genus' relative abundance in the experimental group or along the whole dataset; circle colors represent the CAGs clusters; red edges suggest a positive correlation between genera, whereas blue edges represent negative correlation; edge thickness is proportional to the Spearman's correlation coefficient.

Supplementary References
[S1] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Supplementary Figure 1. Definition of bacterial Co-abundance groups (CAGs). (a) Heatmap used to define CAGs, showing the Spearman correlation coefficient between genera and hierarchically clustered on the basis of Euclidean distance and Ward linkage. Only genera present at least at 0.5% relative abundance in at least 30% of the samples per experimental condition (i.e.: HC, VVC, CT, BV) are shown. Clustering is performed only on genera whose correlation is statistically different from 0 (pvalue of the linear model <0.05). (b) Network plot highlighting correlation relationships of CAGs for the whole cohort studied (n=79). Circle sizes indicate genus abundances and line thickness is proportional to correlation value. Red lines indicate a positive correlation value; blue lines a negative one. (c) Bar plots showing the average cumulative relative abundance of each CAG in the microbiota of the subjects for each experimental group. In grey, the portion of genera not belonging to the identified CAGs due to the initial filtering is represented.