Taxonomic and Functional Differences in Cervical Microbiome Associated with Cervical Cancer Development

The cervical microbiome is associated with cervical cancer risk, but how microbial diversity and functional profiles change in cervical cancer remains unclear. Herein, we investigated microbial-compositional and functional differences between a control group and a high-grade cervical intraepithelial neoplasia and cervical cancer (CIN2/3-CC) group. After retrospective collection of 92 cervical swab samples, we carried out 16S rRNA amplicon sequencing on 50 and 42 samples from the control and CIN2/3-CC groups, respectively. The EzBioCloud pipeline was applied to identify the genomic features associated with the groups using 16S rRNA data. A linear discriminant analysis effect size (LEfSe) was performed to assess the enrichment in the assigned taxonomic and functional profiles. We found a lower richness in the control group relative to the CIN2/3-CC group; however, the β-diversity tended to be similar between the groups. The LEfSe analysis showed that a phylum Sacchaaribacteria_TM7, 11 genera, and 21 species were more abundant in the CIN2/3-CC group and that one uncharacterized Gardnerella species was more abundant only in the control group. Further characterization of the functional pathways using EzBioCloud showed that the 4 KEGG orthologs (Phosphotransferase system [PTS] sucrose-specific IIA, IIB, IIC components and PTS cellubiose-specific IIC component) were involved in the KEGG pathway of starch and sucrose metabolism. The two pathways of folate biosynthesis and oxidative phosphorylation were more abundant in the CIN2/3-CC group. Further confirmation of these results in larger samples can help to elucidate the potential association between the cervical microbiome and cervical cancer.

Richness and diversity in cervical microbiome. We compared the microbiome richness and diversity between the control and CIN2/3-CC groups (Fig. 1). The microbial richness of cervical swabs evaluated at the species level was significantly higher in the CIN2/3-CC group than in the control group, as measured by Chao 1 (p = 0.03), and the number of operational taxonomic units (OTUs) found in the microbiome taxonomic profile (MTP) index (p = 0.017) was higher in the CIN2/3-CC group as well. Conversely, the diversity index, as calculated using the Shannon, Simpson, and Bray-Curtis indices, did not differ significantly between the groups at the phylum or genera level.
Identification of cervical microbiome between control and CIN2/3-CC groups. We found that seven phyla (Firmicutes, Actinobacteria, Bacteriodetes, Proteobacteria, Fusobacteria, Tenericutes, and Saccharibacteria_ TM7) were all highly abundant (with averages higher than 0.1%) in the cervical swab samples ( Table 2). The phylum Firmicutes were most abundant in both groups. The phylum Saccharibacteria_TM7 was less abundant in the CIN2/3-CC group (p = 0.002) than in the control group. We explored the relative abundances of genus Lactobacillus in the groups using the Wilcoxon rank-sun test. However, none of the identified Lactobacillus species reached the level of significant difference between the control and CIN2/3-CC groups ( Supplementary Fig. 1). Identification of metabolic-functional pathways between control and CIN2/3-CC groups. We carried out a LEfSe analysis to discover the most relevant functional pathways responsible for the differences between the control and CIN2/3-CC groups. Among the 224 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, three remarkably differed between the two groups (Fig. 3A). The starch and sucrose metabolism Bray-Curtis indices. The Shannon and Simpson α-diversity indices were applied to estimate the diversity for each group using the Wilcoxon rank-sum test. Beta diversity was calculated with Bray-Curtis distances based on the taxonomic abundance profiles. Permutational multivariate analysis of variance (PERMANOVA) was applied to measure the statistical significances of β-diversity. www.nature.com/scientificreports www.nature.com/scientificreports/ pathway (ko00500) was significantly (p = 0.02) abundant in the control group, while the KEGG pathways of oxidative phosphorylation (ko00190, p = 0.008) and folate metabolism (ko00790, p = 0.04) were significantly abundant in the CIN2/3-CC group. Of the 2860 KEGG orthologs (KOs), 30 showed significant inter-group differences (Fig. 3B). Among those 30 KOs, only 4 KEGG orthologs (i.e., DNA (cytosine-5)-methyltransferase 1, putative transposase, 1, 4-dihydroxy-2-naphthoate octaprenyl transferase, periplasmic protein TonB) were more abundant in the CIN2/3-CC group, whereas the remaining 26 were more abundant in the control group (Fig. 3B). Four of the KO (Phosphotransferase system [PTS] sucrose-specific IIA, IIB, IIC components and PTS cellubiose-specific IIC component) were involved in the KEGG pathway of starch and sucrose metabolism (ko00500) ( Table 4).

Discussion
To investigate the microbial dysbiosis associated with CIN progression to invasive cancer, we first evaluated the average relative abundances of phyla in the cervical microbiome. Secondly, we compared the diversity indices among the datasets to find whether the control group showed higher species diversity relative to the CIN2/3-CC group. Finally, we tried to identify differentially abundant taxa and metabolic pathways between the control and CIN2/3-CC groups. For compositional differentiation, we noted the appearance of the phylum Saccharibacteria_ TM7 and the double augmentation of certain phyla such as Bacteriodetes, Fusobacteria, Proteobacteria, and Tenericutes in the CIN2/3-CC group compared with the control group. These results were supported by previous findings pointing to the overrepresentation of Firmicutes at low abundance in patients with cervical lesions and Tenericutes, which indicated an increase in proportion with increasing severity of CIN grade 19 . We observed significant depletion of certain phyla such as Saccharibacteria_TM7 in the control group relative to the CIN2/3-CC group. However, microbiome variation did not differ between the groups, either with regard to the diversity indices or sample variation (β-diversity). These results are supported by previous studies in which the bacterial diversities in the healthy group did not differ from those in CIN2/3-CC patients 9,21 . In the present study, we also found that species richness was higher in the CIN2/3-CC group than in the control group. In contrast to our results, an earlier study reported that bacterial richness was higher in women with high-grade CIN relative to healthy and low-grade CIN 22 . Microbial richness can be considered as an indicator of health status 12 , since www.nature.com/scientificreports www.nature.com/scientificreports/ variation of diversity can be associated with atypical health conditions. The higher richness observed in the present CIN2/3-CC group suggests that the bacterial diversity of cervical pre-and cancerous epithelial cells is partially altered from the healthy state 23 .
We also assessed whether selected Lactobacilli species could differentiate the CIN2:3-CC group from the control group. We observed that, although none of those bacteria reached statistical significance, L. inners showed a trend toward increased proportion, while the other identified Lactobacillus species, including L. crispatus, L. fornicalis, and L. vaginalis, tended to be less abundant proportions in the CIN2:3-CCgroup relative to the control group. Moreover, LEfSe analysis also demonstrated that an unclassified species of genus Gardnerella was more abundant in the control group. In line with our finding, another species of this genus (Gardnerella vaginalis) was associated with clearance of HPV infection 11 , which is the primary risk factor for cervical cancer. A study also reported that a cervical epithelium dominated by L. iners was associated with high-grade CIN 21 . In line with this, we found that 9 genera and 21 species were significantly abundant in the CIN2/3-CC group relative to the control group. Among them, the genera Prevotella, Staphylococcus, and Streptococcus were found to be  Table 3. Relative abundances of microbial taxa in control and CIN2/3-CCgroups. The statistical significance was tested using LEfSe analysis at the p-value of 0.05.  www.nature.com/scientificreports www.nature.com/scientificreports/ Among the KEGG pathways, oxidative phosphorylation (ko00190) and folate metabolism (ko00790) were significantly abundant in the CIN2/3-CC group. The folate metabolism pathway has been reported to be essential in proliferating tissues, and one-carbon metabolism is upregulated in cancers 25 . Although the folate biosynthesis pathway has been found to be significantly altered in the fecal microbiome of prostate 27 and cervical cancer patients, the biological function of folate in cervical carcinogenesis remains unclear. It might be linked with the activity and function of the fragile histidine triad and the 8-hydroxy-2'-deoxyguanosine gene 28 . However, there is also evidence that folate biosynthesis has an impact on different carcinogenesis processes, since certain anti-folate drugs, namely methotrexate and aminopterin, have been shown to be effective in treating cancers 29,30 . Therefore, enrichment of the folate biosynthesis pathway can be considered to be involved in cervical carcinogenesis. Further, it is commonly recognized that cell division exceeds the bioenergy demand through the process of aerobic glycolysis 31 . Due to the uncontrolled proliferation of cancer cells, the energy from aerobic glycolysis seems to be insufficient to support cellular metabolism in cell proliferation; therefore, cellular metabolism in cell proliferation is associated with, or even privileges, mitochondrial respiration through the oxidative phosphorylation pathway 32,33 . Pre-or cancerous cells use co-activator 1-alpha to boost the oxidative phosphorylation pathway 34 . Previous studies have reported that oxidative phosphorylation was more abundant in late-stage oral squamous cell carcinoma and breast cancer. Another study demonstrated that cells in colorectal cancer use the oxidative phosphorylation pathway to satisfy their metabolic demands 34 . In support of the above-mentioned findings, it is plausible that metabolic demand can explain the oxidative phosphorylation pathway enrichment of the microbiome in CIN2/3-CC group.
We also observed the enrichment of the starch and sucrose metabolism pathway in the control group compared with the CIN2/3-CC group. In line with our results, a recent study also demonstrated enrichment of the starch and sucrose metabolism pathway in healthy patients relative to those with prostate cancer 27 and bladder cancer 35 . The starch and sucrose metabolism pathway was associated with phosphotransferase enzymes, including sucrose-specific IIA, IIB, and IIC, as predicted using the PICRUSt pipeline ( Table 4). Instead of orthologs involved in the starch and sucrose metabolism pathway, we also identified certain orthologs that were more abundant in the control group, and that are involved in different biological processes. Interestingly, we found DNA polymerase v, which is often existent in many bacteria such as Escherichia coli as well as being involved in DNA replication, repair and damage tolerance. It has been reported that polymerase plays an essential role in trans-lesion synthesis in breast cancer cells 36 . We also found lactocepin, which is a lactic-acid-bacteria-secreted protein that has been shown to have a potential to degrade certain pro-inflammatory chemokines such as interferon-gamma-inducible protein 10 37 . It has also been demonstrated that lactocepin expressed by Bifidobacterium spp. can improve colitis associated with cancer in mice 38 . The PICRUSt pipeline has predicted, moreover, that DNA-3-methyladenine glycosylase, a DNA alkylation damage agent, has a cell-protective function against the killing effect of chloroethylnitrosoureas during cancer chemotherapy and also plays an important role in cancer prevention 39,40 . The control group in the present study also showed more abundant DNA-damage-inducible protein J, which belongs to a group of inducible proteins. These proteins have the main function of assuring the maintenance of the lesion repair process 41 . By contrast, we also predicted and profiled four orthologs that were more abundant in the CIN2/3-CC group relative to the control group, and two of them, DNA (cytosine-5)-methyltransferase 1 (DNCMT1) and putative transposase, were found to be involved in the DNA methylation and replication processes. DNCMT1 has been shown to have a potential for methylation-induced gene silencing as well as maintenance of CpG island methylation in human cancer cells 42,43 . As for the putative transposase, it has been demonstrated to act as an oncogenic mutator, and, therefore also, as a contributor to the development of a broad spectrum of tumors in human and mouse cancers 42,43 . The identification of these more abundant pathways and orthologs in the CIN2/3-CC group (not in the control group) was suggestive of a metabolism whereby changes are induced in cervical carcinogenesis.
In conclusion, in this case-control analysis of the cervical microbiome using 16 s rRNA amplicon sequencing with EzBioCloud, we observed significantly different microbial abundances and enriched metabolic functions between normal controls and CIN2/3-CC patients. The identification of certain species such as F. nucleatum and P. amnii and functional profiles (folate biosynthesis and oxidative phosphorylation) during CIN progression to cervical cancer might contribute to improved early diagnostics for patients with precancerous disease. Future studies should aim to elucidate the specific roles of bacteria, pathways and orthologs for enhanced understanding of the role of the cervical microbiome in cervical carcinogenesis.

Materials and Methods
Study design and subjects. The protocol for cohort study recruitment conformed to the Declaration of www.nature.com/scientificreports www.nature.com/scientificreports/ DNA sequencing was performed using a Roche platform (Roche454 GS-FLX plus, Branford, CT, USA) to generate single-end reads at Macrogen Company Ltd. (Seoul, Republic of Korea). Pyrosequencing reads are available in the EMBL SRA database (http://www.ebi.ac.uk/ena/data/view/PRJEB5760).
Quality-controlled 16S reads and taxonomic assignment. The single-end reads were uploaded to the EzBioCloud 16S-based MTP app (ChunLab, Inc., Seoul, Republic of Korea) to check the data quality. The cloud app of the Ezbiocloud software was used to detect and filter out sequences of low quality with regard to read length (<80 bp or >2,000 bp) and averaged Q values less than 25. Denoising and extraction of non-redundant reads were carried out using DUDE-Seq software. The UCHIME algorithm was applied against the Ezbiocloud 16S chimera-free database to check and remove chimera sequencing. Taxonomic assignment was performed using the USEARCH program to detect and calculate the sequence similarities of the query single-end reads against the EzBioCloud 16S database. EzBioCloud sequencing reads were clustered into OTUs at 97% sequence similarity using the UPARSE algorithm 45 . Single-end reads from each sample were clustered into many OTUs using the UCLUST tool with the above-noted cutoff values.

Functional metagenome prediction.
For the EzBioCloud 16S-based MTP pipeline, the PICRUST algorithm was used to estimate the functional profiles of the microbiome identified using 16S rRNA sequencing. The raw sequencing reads were computed using the EzBioCloud 16S microbiome pipeline with default parameters and discriminating reads that were encountered in the reference database. The functional abundance profiles of the cervical microbiome were annotated based on bioinformatics analyses, specifically by multiplying the vector of gene counts for each OTU by the abundance of that OTU in each sample 17 , using the KEGG (Kyoto Encyclopedia of Genes and Genomes) orthology and pathway database. The predicted metagenome profiles were categorized into clusters of KEGG Orthology and KEGG pathways and compared between the control and CIN2/3-CC groups. The accuracy of each of the functional profiles was determined according to the nearest sequenced taxon index 21 . Statistical and bioinformatic analyses. The differences in the demographic and lifestyle characteristics were examined between the groups using the t-test for continuous variables and the chi-squared test for categorical variables. Microbial richness was measured by Chao1 and the number of OTUs found in the microbiome taxonomic profile (MTP) index. The Shannon and Simpson α-diversity indices were applied to estimate the diversity for each group using the Wilcoxon rank-sum test. Beta diversity was calculated with Bray-Curtis distances based on the taxonomic abundance profiles. Permutational multivariate analysis of variance (PERMANOVA) was applied to measure the statistical significances of β-diversity. LEfSe was performed to determine enrichment in the assigned taxonomic and functional profiles of the two groups. Taxonomic levels with LEfSe values higher than 2 at a p-value < 0.05 were statistically significant. The ggplot2 package in the R program (version 3.4.3., R Foundation for Statistical Computing, Vienna, Austria) was used to visualize the LEfSe differences between the groups. All of the calculated p values were two-tailed and considered statistically significant at p < 0.05.

Data availability
The datasets generated during the current study are available in the EMBL SRA database (http://www.ebi.ac.uk/ ena/data/view/PRJEB5760).