Introduction

Human papillomaviruses are a group of double-stranded DNA viruses which comprise up to 223 different types, with new types being continuously identified1,2,3,4. About 12 HPV types are nowadays classified as oncogenic, high-risk HPV genotypes, and persistent infection of these oncogenic HPV types is known to be a necessary cause for cervical cancer.

Although effective screening methods (cytology and HPV testing) exist and several effective prophylactic vaccines are available in the market to prevent cervical cancer, the annual number of new cases of cervical cancer is estimated to increase from 570,000 to 700,000 between 2018 and 2030, with the annual number of deaths rising from 311,000 to 400,0005. Cervical cancer is preventable, but it is still one of the most common cancers and causes of cancer-related death in women, worldwide.

Due to advances in molecular testing, the more sensitive method of HPV-based screening is now replacing cytology as the primary cervical cancer screening tool. This method translates into testing cervical samples for HPV DNA rather than morphological abnormalities and constitute a more automated and objective screening strategy. Women with a negative HPV test, have very low risk of subsequent cancer and may safely be re-called in three-seven years. Nevertheless, a significant proportion of cervical cancers (around 7%) appears to be HPV negative6,7,8. This percentage of HPV negative cancers may translate into women not being categorized correctly for the screening program and tumors being detected only when they had become symptomatic (very late). Of particular note, authors have shown that cervical cancer cases that appear to have lost HPV DNA seem to constitute a specific subgroup of cervical cancers with a different biological behavior (worse prognosis)9,10.

Efforts to find out and explain HPV negativity in cervical cancers have led scientists to reanalyze smears and retest these HPV negative tumors by using different approaches. Main explanations for HPV negativity have been sample inadequacy, false diagnosis, presence of cancers associated with types that were not targeted in the studies, and false negative results due to low sensitivity of the HPV detection methods6,7,8,11.

Most HPV screening methods are focused on detecting high-risk HPV types and are based on PCR amplification of conserved regions in the L1 gene by consensus primers, followed by HPV genotyping identification by hybridization of amplicons to type-specific probes12,13. However, these methods are biased to detect mostly known HPVs that bind specifically to designed primers and probes. Unknown HPV types, known HPV types that are distantly related as well as targeted HPV types presenting variations within probe/primer regions may escape amplification or hybridization4,14,15,16,17. To overcome this bottleneck, it is nowadays possible to perform unbiased metagenomic sequencing (not based on PCR) and detect all HPVs present in a sample, without prior knowledge on which types might be present14,18,19. Furthermore, if cDNA is sequenced, the data will reveal if there is viral transcriptional activity, essential for both initiation and maintenance of the malignant phenotype.

Following the three recommendations mentioned above, cervical tumors classified as HPV negative after (a) assuring sample adequacy, (b) reanalysis of corresponding slides/smears to confirm tumor diagnosis and (c) not revealing presence of HPV after performing an unbiased sequencing analysis, may be called truly HPV negative. Understanding the biology of truly HPV negative cervical cancers is urgently needed in the era of cervical cancer elimination.

Cervical cancer is known to have an infection as cause, being HPV the primary predisposing factor. Truly HPV negative cervical cancers could have initially been caused by HPV but lost the virus as the tumor progressed. There is a very high proportion of cervical intraepithelial neoplasias (CIN) 3 that contains HPV and there are cases of HPV negative cancers that have been HPV positive in the earlier screening appointments20. On the other hand, it is possible that another infectious agent may play a role in this specific subgroup of HPV negative cancers. Taking into consideration both possibilities, a search for an infectious cause/biomarker should focus on unbiased RNA sequencing of truly HPV negative cancers. We aimed to analyze and compare the metatranscriptome of HPV positive cervical cancers with the metatranscriptome of HPV negative cancers and assess whether there are differences that could act as biomarkers to early identify these HPV negative cancers.

Materials and methods

In a previous study, we systematically genotyped all cervical cancer cases occurring in Sweden from 2002–2011 (n = 2850), and for those that were HPV negative (n = 527, 18.5% of total cervical tumors), sample adequacy and correct diagnosis were analyzed and reanalyzed, respectively. After excluding samples that were not adequate (beta-globin negativity) and those whose diagnosis was not confirmed, a different PCR approach (targeting E6/E7 genes instead of L1) was performed. Repeatedly negative samples decreased to 394/2850 (13.8% of total cervical tumors). We then subjected all HPV negative carcinomas (and a subset of 59 HPV PCR positive cervical cancers used as positive controls) to an unbiased RNA sequencing using Novaseq 6000 (Illumina Platform), generating high quality sequencing data with a median of 30 million reads per sample. RNA positivity was detected in 169/392, decreasing the percentage of cervical cancers negative for HPV from 18.5% to 7.8% (223/2850)15.

For this study, we selected all the 223 HPV negative specimens and compared the metatranscriptomes present with the metatranscriptomes detected in 223 HPV positive cervical cancers. These 223 HPV positive cervical specimens corresponded to 169 specimens that were originally HPV PCR negative but turned out to be HPV positive when subjected to unbiased sequencing as well as 54/59 HPV PCR positive samples used previously as positive controls (5/59 HPV PCR positive samples were HPV negative when subjected to sequencing). A total of 11 blank paraffin blocks pools were used as negative controls. While sectioning each cervical tumor formalin-fixed paraffin-embedded (FFPE) block, blank paraffin blocks had been sectioned in between as control for contamination. These paraffin blocks were extracted in the same manner as the cervical tumors, pooled (each pool containing about 45 blank blocks) and sequenced together with the corresponding tumor blocks.

High-quality non-human reads were classified using Kraken2 v. 2.1.121, which was run against a reference database containing all RefSeq bacterial and viral genomes (built in December 2020) with a 0.1 confidence threshold. A cut-off of 10 classified unique reads was used to discriminate positive genera for bacteria and viruses, and results reported all genera which comprised more than 1% of total bacterial or viral reads, respectively.

As HPV was the key difference between the cervical tumors (and the only significant difference detected between the 2 groups), we also queried all high-quality non-human reads to a reference database containing all human papillomavirus nucleotide sequences deposited in GenBank until July 8th 2022 (parameters: Taxid 151,340, length 5000–12,000, excluding non-partial genomes) using Kraken2 v. 2.1.121, with a 0.1 confidence threshold. A cut-off of 10 classified unique reads was used to discriminate positive species. Reads that corresponded to HPV genomes were subjected to visual inspection using Integrative Genomics Viewer to confirm mapping.

Statistical analysis

Differences in bacterial/viral presence across the two groups (HPV positive and HPV negative cancers) were evaluated by comparing median unique reads with nonparametric Wilcoxon rank-sum test, and relative proportions of bacteria/viral communities among the HPV positive and HPV negative cervical cancers were analyzed using a two proportion Z-test and its associated p-value. Bonferroni correction was applied considering a 0.05 error rate (alpha level) and statistical significance was then obtained if p < 0.0002.

Ethical approval was granted by the Regional Ethical Review Board of Stockholm, Sweden (EPN-Dnr: 2012/1028/32). The Regional Ethical Review Board determined that, due to the population-based nature of the study, informed consent from study participants was not required (EPN-Dnr: 2011/1026–31/4). All methods were performed in accordance with the relevant guidelines and regulations.

Results

In a previous study, all cervical cancers diagnosed from 2002 to 2011 in the whole Sweden were collected and genotyped. HPV negativity was determined after sample adequacy confirmation, diagnose reanalysis and confirmation of cervical cancer, and not detecting HPV after both PCR and unbiased RNA sequencing15. A total of 223 HPV negative cervical cancers were reported. The present study aimed to compare the metatranscriptome present in these 223 HPV negative cervical cancer specimens with the corresponding metatranscriptome present in 223 HPV positive cervical tumors.

The median age at cancer diagnosis of the HPV negative cervical cases (n = 223) was 68 years (range 30–93 years) and the median age at cancer diagnosis of the HPV positive cervical cases (n = 223) was 56 years (range 24–95 years).

Bacteriome

The metatranscriptome analysis from RNA sequencing had a total number of 63.80 M annotated bacterial reads for the HPV positive specimens and 62.81 M annotated bacterial reads for the HPV negative specimens.

A total of 84 different bacterial genera showing at least 10 reads and 1% of total bacterial annotated reads were detected within the 223 HPV positive and 223 HPV negative cervical tumors. Only 6/84 genera showed a positive median number of reads/sample for both specimen groups (HPV positive and HPV negative cancers) (Table 1).

Table 1 Bacterial genera and annotated reads detected in cervical cancer specimens.

The same 6 genera were found in both types of cervical tumors (HPV positive and HPV negative cancers), and the corresponding proportions were similar. From higher to lower number of median annotated reads/sample: Klebsiella (66,563 and 69,618 median reads/sample), Staphylococcus (40,020 and 39,935 reads), Pasteurella (31,263 and 31,026 reads), Burkholderia (5633 and 5643 reads), Paracoccus (4817 and 4955 median reads) and Bacillus (2898 and 3260 reads) were the genera detected in HPV positive and HPV negative specimens, respectively. All 6 genera were present in at least 87.89% of specimens. No statistical difference (p < 0.001) in median read level nor in proportion was detected when comparing any bacteria genera within HPV positive and HPV negative specimens (Table 1).

Analysis of controls (blank paraffin specimens) showed a total of 31 different genera with 28/31 genera being also detected in cervical cancer specimens. All 6 genera that showed positive median read values in cervical tumors, were also present in at least one of the blank controls, with Klebsiella, Staphylococcus, Pasteurella and Paracoccus being identified in 1/11 blank controls and Burkholderia and Bacillus in 5 and 3 blank controls, respectively.

Virome

The metatranscriptome analysis from RNA sequencing had a total number of 613,606 annotated viral reads for the HPV positive specimens and 575,734 annotated viral reads for the HPV negative specimens.

Overall, 63 different viral genera were detected when analyzing the RNA metatranscriptome within HPV positive and HPV negative tumors, with 6/28 genera showing a positive median annotated read/sample value (Table 2).

Table 2 Viral genera and annotated reads detected in cervical cancer specimens.

Three out of these 6 genera: Gorganvirus, Orthobunyavirus and Betabaculovirus, showed a positive median read value in both HPV negative and HPV positive cervical tumors while Alphabaculovirus, Alphapapillomavirus and Pandoravirus showed a positive median read value only in HPV positive tumors. Statistical significance (p < 0.001) was only detected for the Alphapapillomavirus genus (both when analysing the difference in median unique reads as well as the relative proportions). Alphapapillomavirus genus was detected in 142/223 HPV positive specimens and in 3/223 HPV negative specimens according to established cutoffs and reporting parameters (10 reads and at least 1% of total viral reads).

Further analysis of HPV presence was performed by subjecting high-quality non-human reads to a more “complete” database containing thousands of human papillomavirus sequences deposited in GenBank using Kraken. The analysis revealed HPV positivity (considered when samples showed more than 10 reads/species) in all 223 HPV positive cervical tumors, 7/10 controls blank pools and 30/223 HPV negative cervical tumors. Visual inspection of reads using Integrative Genomics Viewer showed that the blank controls and HPV negative cervical tumors that turned out to be HPV positive with this analysis did in fact show more than 10 reads, but reads were very short (< 50 bp) and viral coverage was below 10%. All 223 HPV positive cervical tumors showed more than 10 reads and HPV coverage was above 10% for all specimens.

Analysis of controls (blank paraffin specimen pools) revealed presence of 74 different viral genera, with 30/74 genera being also present in HPV negative and/or HPV positive cervical cancer samples. Most of the genera (25/30) were present in less than half of the blank controls, while Locarnavirus, Chlorovirus, Gemycircularvirus, Orthobunyavirus and Betabaculovirus, were present in 6/11, 9/11, 9/11, 10/11 and 11/11 blank pools, respectively.

Discussion

Cervical HPV negative cancers exist and have unique properties such as late stage diagnosed cancers with poor prognosis. In this paper, we compared the metatrascriptome from 223 HPV negative cervical cancers with the metatranscriptome corresponding to 223 HPV positive cancers to inspect if there was any difference between them.

Strengths of this study include the use of HPV negative cancers whose diagnosis, sample adequacy and HPV result had been confirmed and analyzed with different methods in order to dismiss false negativity. FFPE specimens were reanalyzed by an expert pathologist and confirmed to indeed contain invasive cervical cancer tissue. Beta-globin was present in all specimens included in the study and HPV detection was performed by using PCR targeting L1, E6/E7 as well as unbiased RNA sequencing. Furthermore, blank paraffin controls were used to assess possible contamination as well as presence of environmental communities, and Bonferroni correction was applied to prevent data from incorrectly appearing to be statistically significant (a common event when performing multiple comparisons).

Bacterial and viral communities have been already analyzed and compared between health and disease, specially thanks to the effort of the Human Microbiome Project (HMP), the first large study to address the diversity of microorganisms present in the different organs of the human body22. Literature agrees on reporting that “healthy” women show a vaginal microbiome with low diversity, dominated by Lactobacillus, and that one of the most prominent features of “disease” is an increase of pH (due to a decrease of lactate concentration), reduction of lactobacilli and a great diversity of bacterial vaginosis-related bacteria, which are primarily anaerobic bacteria23. Studies on HPV positive CIN and cervical cancers show higher diversity in vaginal microbiota, with depletion of Lactobacillus crispatus and increase abundance of anaerobic bacteria24,25. While our results from cervical cancer cases agree with the literature (higher diversity of bacteria and low abundance of Lactobacillus, only present in 10/446), microbiome differences between HPV negative and HPV positive cervical cancers have not been studied yet. No significant difference among present genera was detected except for the presence of alpha-papillomaviruses, which is in line with the fact that HPV is the only reported infectious agent so far associated with cervical cancer, supporting the hypothesis that loss of HPV may occur as the tumor progresses and may be the reason for the occurrence of such cancer which is almost 7% of all cervical cancers diagnosed. A weakness from the present study may comprise not having previous screening specimens from the corresponding women to see if HPV presence was detectable prior cancer diagnosis.

Human papillomaviruses were initially reported in 143/223 HPV positive cervical cancers (142 samples showing predominantly an alphapapillomavirus genus, and 1 specimen showing a gammapapillomavirus) and 3/223 HPV negative cervical cancers when querying sequencing reads to the RefSeq database. Genera were reported when they presented more than 10 reads and comprised at least 1% of total bacterial/viral reads/sample. A further investigation of HPV presence using a broader HPV database identified HPV in all HPV positive tumors, as well as in 7/10 controls blank pools and 30/223 HPV negative cervical tumors. Visual inspection of these reads revealed questionable calling for both the blank pools as well as the 30/223 HPV negative cervical tumors, where the viral coverage was below 10%.

There is no consensus about which cut-offs to use for HPV “calling” when performing sequencing analysis and there is an urgent need to establish those26. To reduce false-negative/positive classification, it is imperative to use complete and updated databases and take into consideration both the number of reads/k-mers (e.g. at least 10 reads) as well as the genome coverage to achieve accurate identification (e.g. at least 10%). Accepting number of reads/k-mers as only parameter is not enough, as artefacts/background/noise may produce false positivity26,27.

In conclusion, the present study reports the metatransciptome analysis of both HPV negative and HPV positive cervical tumors and does not detect any statistical difference in bacterial/viral communities´ expression when comparing HPV positive and HPV negative cervical cancers (except for Alphapapillomavirus). Further studies are needed to possibly find differences and/or biomarkers to early identify this biologically distinct subgroup of HPV negative cervical cancers. As the study is clear regarding that the metatranscriptome does not differ much between HPV negative and HPV positive cancers, we suggest that it may be more rewarding to look for differences in the human genome and transcriptome between the HPV positive and HPV negative cervical cancers.