Introduction

Cervical cancer is the fourth most frequent cancer in women worldwide, with approximately half a million new cases each year of whom ~50% are lethal [1]. Nearly all cervical cancers are associated with infection with one of 14 high-risk human papillomaviruses (high-risk HPV), subtypes HPV16 and HPV18 being responsible for ~70% of cases [1, 2]. Persistent infection with high-risk HPV is considered a necessary cause for the development of squamous cell carcinoma of the cervix, while cervical adenocarcinomas are often, but not always, HPV-related [2,3,4,5,6].

The life-time risk for women to acquire an HPV infection is ~80%, but by far the majority of infections are cleared within 2 years. In ~10% of women the infection persists [1, 7, 8]. These cases may be associated with the development of low-grade lesions (cervical intraepithelial neoplasia), most of which however regress within a year. Only a minority of all persistent high-risk HPV infections becomes a transforming infection that may progressively develop into high-grade cervical intraepithelial neoplasia and cancer, a process that may take years [9]. The risk of developing high-grade cervical intraepithelial neoplasia is higher after infection with HPV16/18 than with one of the other high-risk HPV subtypes [10]. Since high-risk HPV DNA testing cannot distinguish between transient and clinically relevant, transforming infections, methods to enable risk prediction in high-risk HPV DNA-positive tissues are needed.

In its episomal form HPV infections are characterized by expression of the early gene E2, a repressor of the oncogenes E6 and E7. Integration of high-risk HPV in the host genome is thought to be an important step in the oncogenesis of cervical cancer [11, 12], as it leads to reduced E2 expression and activation of the E6 and E7 oncogenes [13, 14]. E6 and E7 gene products are primarily responsible for oncogenic transformation by interference with p53 and pRb proteins, disrupting DNA repair regulation and cell cycle checkpoints [12, 15,16,17,18,19,20,21,22]. Overexpression of E6 and E7 oncogenes can therefore be used to identify transforming infections, whereas levels of E2 expression may provide additional information on viral integration state [6]. To obtain a comprehensive understanding of cervical carcinogenesis in clinical samples, methods for simultaneous detection of E2, E6, and E7 transcripts of all high-risk HPV subtypes are therefore needed and multiple tests for HPV E6/E7 mRNA detection are now available [23, 24]. The slow development from initial high-risk HPV infection to cervical cancer makes it likely that gradual accumulation of tumorigenic events in cervical intraepithelial neoplasia is required for progression to cancer [25]. Mapping of these events will further contribute to our knowledge of cancer biology in the cervix.

The aims of this study were therefore to (1) investigate the value of targeted RNA next-generation sequencing for high-risk HPV genotyping and oncogene activity monitoring in cervical tissues, (2) identify possible biomarkers other than high-risk HPV that may distinguish normal cervical tissue from cervical cancer, and (3) determine the quality of targeted RNA sequencing data originating from cervical scrape samples, possibly allowing better noninvasive risk assessment for the development of cervical cancer.

Materials and methods

Patient material

In this prospective single-center study frozen cervical tissue samples were included from 22 non-pregnant Dutch women, presenting at the department of Obstetrics and Gynecology of the Radboudumc for surgery of the uterus and/or cervix for different clinical indications. Punch biopsies from cervical tissue were taken during surgery or after extirpation. Tissues were snap frozen in liquid nitrogen and stored at −80 °C until RNA isolation. H&E stainings were performed on 4 µm cryosections and reviewed by a pathologist (JB, Table 1). Based on histopathology, which was concordant with clinical patient diagnosis, tissues were classified as normal cervix tissue (n = 4, N1-4) or cervical cancer (n = 18, composed of 15 squamous cell cancers [SCC1–15] and 3 adenocarcinomas [AdCa1–3]). Evalyn Brush (Rovers Medical Devices B.V., Oss, Netherlands) cervicovaginal samples were collected in triplicate from two healthy women and either stored dry or in Preservcyt solution (Hologic, Marlborough, MA, USA) for different time spans (dry: 1 h, 1 week; Preservcyt: 1 h, 24 h, 1 week, 2 weeks). Written informed consent was obtained from all women in this study. The protocol was approved by the regional institutional review board (No. 2014–1295). All methods were performed in accordance with the guidelines for use of human tissue of the Radboudumc and the tissue samples were anonymized to the researchers.

Table 1 Tissue samples with diagnosis and cancer stage

HPV genotyping

DNA was isolated from all tissues using MagNA Pure (Roche, Bazel, Switzerland). The purified DNA was eluted in 50 μl TE-buffer. For detection and genotyping of HPV, broad spectrum HPV amplification was performed using the short-PCR-fragment line probe assay assay (SPF10-LiPA25; Labo Bio-medical Products B.V., Rijswijk, The Netherlands). This assay amplifies a small fragment of 65 bp from the L1 open reading frame and allows detection of a broad range of HPV genotypes with high analytical sensitivity [26].

Targeted RNA next-generation sequencing

Targeted RNA sequencing was performed essentially as described before [27], without prior knowledge on HPV status. RNA was isolated from 10 µm cryosections using TRIzol reagent (ThermoFisher Scientific, Waltham, MA, USA) and reverse transcribed with Superscript II (ThermoFisher Scientific) using random hexamer primers, according to the manufacturer’s instructions. A previously described cohort of single-molecule molecular inversion probes [27] was expanded with among others probes recognizing E2, E6, and E7 transcripts from high-risk HPV types 16, 18, 31, 33, 45, and 52. Single-molecule molecular inversion probes to detect E2, E6, and E7 transcripts from HPV39 and HPV56 were initially not included in the panel, but were added at a later stage to confirm line probe assay results. Probe design was based on the MIPgen algorithm as described by Boyle et al. [28], and included a random octanucleotide unique molecule identifier (UMI).

All assay preparations were performed at 4 °C. Single-molecule molecular inversion probe pool preparation, capture and exonuclease treatment were performed using the Microlab STAR automated liquid handling workstation (Hamilton, Reno, Nevada, USA). Single-molecule molecular inversion probes were pooled at 100 µM/probe (246 transcripts, 1301 probes) and phosphorylated using T4 Polynucleotide Kinase (New England Biolabs, Ipswich, MA, USA) in T4 DNA ligase buffer (NEB) for 45 min at 37 °C, followed by 20 min inactivation at 65 °C. The capture reaction was performed using 15–50 ng of cDNA and a 1 pM pool of phosporylated probes. Capture and enzymatic circularization by primer extension and ligation were performed in a reaction mixture containing Ampligase buffer (Epicentre, Madison, WI, USA), dNTPs, Hemo KlenTaq enzyme (New England Biolabs, NEB, Ipswich, MA, USA) and DNA ligase (Ampligase, Epicentre) by incubation for 10 min at 95 °C followed by 18 h at 60 °C. Non-circularized single-molecule molecular inversion probes and remaining RNA and cDNA were removed by exonuclease treatment as described [27]. The pool of circularized probes was subjected to PCR amplification using the Microlab STARlet automated liquid handling workstation (Hamilton), with 2× iProof High-Fidelity DNA Polymerase master Mix (Bio-Rad, Hercules, CA) and a primer set including a unique barcoded reverse primer for each sample. A total of 2 μl of each sample was pooled using the Microlab STAR and 266 bp amplicons were purified using AMPureXP beads (Beckman Colter Genomics, High Wycombe, UK). After quality control on a TapeStation 2200 (Agilent Technologies, Santa Clara, CA, USA) and quantification using Qubit (Life Technologies, ThermoFisher Scientific, Waltham, MA USA), a 4 nM library was sequenced on the Illumina Nextseq platform (Illumina, San Diego, CA) at the Radboudumc sequencing facility. Reads were mapped against reference transcripts (UCSC human genome assembly hg19 and variant-specific FASTA sequences) using the SeqNext module of JSI SequencePilot version 4.2.2 build 502 (JSI Medical Systems, Ettenheim, Germany). All identical PCR products were reduced to one consensus read (unique read) using the UMI. Unique read counts for each single-molecule molecular inversion probe in a sample were divided by the total of unique read counts in that sample and multiplied by 106 (Fragments per Million, FPM). Individual transcript levels were expressed as mean FPM of all probes targeting that transcript. Less than 10 unique read counts for a high-risk HPV transcript were considered insignificant. Variant calling was performed within SeqNext, excluding all variants with a coverage of <10% variant reads.

Statistical analysis

Statistical analyses were performed in R (version 3.4.3). Mean FPM values were log2 transformed (after addition of 0.01 to prevent log(0) transformation errors) and clustered in an unsupervised manner using the Manhattan distance and Ward.D2 method, and translated into a heatmap using ClustVis. A Wilcoxon Mann–Whitney U test was performed to study differential gene expression between clusters. Multiple testing corrections were done using Benjamini–Hochberg (FDR < 0.01).

Results

High-risk HPV DNA vs. RNA detection

Line probe assay testing detected high-risk HPV DNA in two out of four normal cervix tissues, all 15 squamous cell carcinomas, and two of the three adenocarcinoma tissues (Fig. 1a). Profiling the same set of tissues for HPV RNAs in the targeted RNA sequencing assay, blinded to the line probe assay results, showed activity of viral oncogenes in all HPV DNA-positive squamous cell carcinoma and adenocarcinoma samples. All genotypes identified by RNA sequencing were concordant with the outcome of the line probe assay assay (Fig. 1a). In two squamous cell carcinomas the line probe assay detected HPV39 and HPV56, two subtypes for which no single-molecule molecular inversion probes were included in the original targeted RNA sequencing panel. Transcriptional activity of these subtypes was confirmed in an additional analysis (Fig. 1a). In AdCa2, HPV16 and HPV18 were detected by line probe assay, but only HPV18 E6 and E7 oncogene activity was detected by targeted RNA sequencing. In SCC11 HPV16 and HPV39 DNA was detected, but only HPV39 E6 and E7 RNA was found.

Fig. 1
figure 1

a DNA versus RNA detection of high-risk HPV. HPV DNA presence (line probe assay; LiPA) versus presence of high-risk HPV E2, E6, and E7 gene transcripts (targeted RNA sequencing; t/RNA-NGS) in cervical tissue samples. Targeted RNA sequencing-based high-risk HPV genotyping was concordant with line probe assay-based genotyping. All cancers in which high-risk HPV was detected by line probe assay expressed E6 and E7 oncogenes, whereas all normal tissues did not express HPV genes, irrespective of line probe assay outcome. E2 expression was not detected in four squamous cell carcinomas (SCC) and one adenocarcinoma (AdCa). b Mean FPM values of E2, E6, and E7 transcripts of the high-risk HPV RNA-based genotypes as shown in a

Samples that tested high-risk HPV-positive by line probe assay analysis but were transcriptionally inactive by targeted RNA sequencing were normal cervical tissues (n = 2). One adenocarcinoma tested negative in both DNA and RNA assays, in agreement with previous studies showing that cervical adenocarcinomas are not always caused by high-risk HPV infection [2,3,4,5].

High-risk HPV E2 versus E6/E7 expression

We next studied whether we can distinguish E2 and E6/E7 viral gene expression, which may be used to discriminate transforming infections from clinically irrelevant infections. In all 17 high-risk HPV RNA-positive tissues oncogenes E6 and E7 were expressed (Fig. 1a), with expression levels ranging from ~440 to 21,000 FPM (Fig. 1b). E6 expression exceeded E7 expression in 13/17 tissues (~75%). E2 expression was undetectable in 5 of the 17 samples (~30%), of which four were squamous cell carcinomas and one was adenocarcinoma. Coexpression of E2 (~740–23,700 FPM, Fig. 1b), E6 and E7 were observed in the other 12 tissues (~70%).

Non-HPV biomarkers of cervical cancer

To discover cervical cancer biomarkers other than high-risk HPV, we performed agglomerative unsupervised cluster analysis of all gene expression data excluding HPV (Fig. 2). Normal cervical tissues and cancer tissues clustered together in separate groups. Within the cancer cluster, adenocarcinomas clustered together, including the HPV-negative AdCa3. SCC11, the only HPV39-positive squamous cell carcinoma in the cohort, did not cluster with the other cervical cancers. We found no association of sample clusters with high-risk HPV E2 versus E6/E7 expression levels.

Fig. 2
figure 2

Heatmap of targeted RNA sequencing outcome of gynecological tissues. Samples include normal cervical tissue (n = 4) and cervical cancer (n = 18). Histopathology diagnosis is presented in the upper bars. The heatmap of 219 genes was generated by unsupervised hierarchical clustering using the Manhattan distance and Ward.D2 clustering method. Note that normal cervical tissues and cancer tissues clustered together, with the exception of one squamous cell carcinoma clustering with the normal tissues. AdCa adenocarcinoma, SCC squamous cell carcinoma

Statistical analysis revealed that 52 genes were differentially expressed between normal cervix and cervical cancer tissues with p < 0.05 (Table 2, see Supplementary Table S1 for all transcripts in the analysis). Interestingly, many tyrosine kinases that are considered targets for precision medicines (FGFRs, PDGFRs, KDR, KIT, AXL, ERBB2, NTRK2) were expressed at lower levels in cancers than in normal cervix tissue, with the exception of the proto-oncogene MET. We previously observed a similar trend in renal cancer [29]. In general, genes involved in glycolysis were expressed at higher levels in cancer tissues (HK2 [hexokinase 2)], PGK1 [phosphoglycerate kinase 1], ENO1 [enolase 1], SLC16A3 [monocarboxylate transporter 4], CA9 [carbonic anhydrase 9]). These features are in accordance with the general concept that cancers experience hypoxia and shift energy production to glycolysis [30]. Expression of the tumor suppressor gene PTEN was lower in the cancer cluster, as was the expression of androgen receptor (AR). EpCAM (epithelial cell adhesion molecule) and the PSMA-coding gene FOLH1 were expressed at relatively high levels in the cancer group.

Table 2 Gene expression in cervical cancer versus normal cervix (p < 0.05)

In addition to gene expression, targeted RNA sequencing also allows for variant detection. The mismatch repair endonuclease PMS2 mutation c.187G>A (p.Val63Met, classified as confirmed somatic in five neoplasms in the liver in the COSMIC database (https://cancer.sanger.ac.uk/cosmic/analyses) and has a likely pathogenic CADD score of 18.5) was detected in 15/18 cancer tissues and in 1/4 normal cervix tissues (not shown).

Targeted RNA sequencing on cervical scrape samples

In the context of nation-wide cervical cancer screening programs, cervical scrape samples are collected for cytology-based screening, which is increasingly preceded by molecular screening for the presence of high-risk HPV DNA. We therefore tested whether it is feasible to implement targeted RNA sequencing analysis on cervical scrape samples, collected under conditions that are relevant for population-based screening programs (with delays between scraping and analysis). Storage of scrape samples at room temperature for up to 1 week, either dry or in Preservcyt fixative, still allowed the generation of good-quality targeted RNA sequencing profiles (Fig. 3). After 2 weeks of Preservcyt fixation, the quality of targeted RNA sequencing data (expressed as total unique reads obtained in a sample) decreased.

Fig. 3
figure 3

Quality of targeted RNA sequencing data originating from cervical scrape samples. Data quality is expressed as percentage unique reads relative to 1 h sample preservation (100%). Up to 1 week of sample preservation either dry or in Preservcyt solution, targeted RNA sequencing generates good-quality transcription profiles from cervical scrapes (n = 2). Total number of unique reads varied per sample (depending on amount of cells in the scrapes and transcriptional activity therein) and started at excellent values of 38,000 and 225,000 for both samples, respectively, at T = 0

Discussion

Infection of cervical tissues with high-risk HPV is not necessarily accompanied by oncogenic activity of high-risk HPV. This is a relevant observation because molecular testing for the presence of high-risk HPV DNA is increasingly implemented in population-based screening programs. High-risk HPV DNA screening is very sensitive (90–95% vs. 30–87% for cytology), objective and allows high-throughput testing [31,32,33] but does not distinguish between deposited and dormant virus, transient infections and clinically relevant active and producing infections. Even with cytomorphology screening as triage test, this results in many unnecessary colposcopy referrals, overtreatment and follow-up. This problem may be solved by measuring transcripts from high-risk HPV E6 and E7 oncogenes in infected tissues [34,35,36,37,38]. A number of studies have now shown the increased specificity of HPV-RNA testing compared with HPV-DNA testing [23, 24, 39].

The aim of this proof of concept study was to investigate the value of targeted RNA sequencing technology to detect high-risk HPV oncogene activity in well-characterized gynecological tissues. We previously showed that targeted RNA sequencing yields reliable quantitative gene expression data and mutation data [27, 29, 40, 41].

During integration into the host genome the reading frame of the early gene E2 is often disrupted [42], alleviating repression of the E6/E7 oncogenes. Concomitant measurement of E2, E6, and E7 transcripts may therefore provide information on high-risk HPV integration status. Till now, comprehensive analysis of high-risk HPV transcripts required that for each tissue sample at least 44 RT-qPCR experiments should be performed (E2, E6, and E7 RT-qPCR for each of the 14 high-risk HPV subtypes, and a positive and negative control RT-qPCR) making such analyses labor-intensive. With targeted RNA sequencing, transcripts from the viral genes E2, E6, and E7 from all 14 high-risk HPV genotypes can be quantitatively detected in one go, whereas the sequence information concomitantly provides direct and reliable high-risk HPV subtype identification.

We observed E6/E7 gene expression in the absence of E2 in 30% of cases, suggesting HPV integration in the host genome. In the remaining cancers, E2 and E6/E7 genes were coexpressed. There are several possible explanations for this finding. E6/E7 expression levels may have become independent of E2 repression, e.g., via methylation of the E2 binding sites [12, 14, 15, 42, 43], or analyzed tissues may have been composed of mixed episomal and integrated viruses (either in the same cells or in different cell clones). Our data also demonstrate that multiple high-risk HPV types can be present in cancer tissues whereas only one of these subtypes is responsible for E6/7 oncogene expression. From this study we cannot conclude whether these cases concerned coinfected cells or multiple cell clones monoinfected by different high-risk HPV types.

Targeted RNA sequencing analysis has multiple other advantages compared with standard techniques. The test uses Illumina barcoding technology and therefore can be performed in parallel for as many samples as there are barcoded primers (currently 386, but expandable), making the test high-throughput and highly cost-effective. A further important advantage is that the current test-format measures expression levels of a number of potentially cancer-associated genes and differentiation genes that are associated with normal physiology. This is highly relevant since host genes play an essential role not only in the development of cancer but also in prevention thereof [44]. Analyzing expression profiles of gynecological tissues may therefore also allow for the identification of non-HPV-related cancers [2,3,4,5]. This notion is strengthened by our finding that AdCa3 in our cohort is high-risk HPV-negative, but still groups with the two high-risk HPV-positive adenocarcinomas in unsupervised clustering analysis. Furthermore, supervised analysis of targeted RNA sequencing data from normal tissues and cancers identified a number of genes that are differentially expressed in cancers and normal tissues. Upregulated genes in cancers involved genes involved in metabolism and confirmed increased hypoxia in cancerous tissues. Other genes that were higher expressed in cancers were EpCAM and FOLH1, confirming previous reports [45, 46] and the proto-oncogene MET. Relatively low expression levels of AR (androgen receptor) in cancer were also in accordance with previous reports [47, 48] and illustrates that in cancers functions may be lost that are present in tissues with normal physiology. Further clinical research and computational biology on targeted RNA sequencing profiles from large numbers of samples in validation sets should confirm whether gene expression profiles can be further refined and translated into transcript classifiers that define a risk for cancer development.

Of high interest, targeted RNA sequencing also allows to investigate potential associations of gene mutations with cervical cancer. Such profiles may be of additional importance for risk assessment of cervical biopsies. Our finding that mutations in the DNA repair protein PMS2 are associated with the cancers in our cohort suggests that PMS2 mutations may play a role in HPV-induced carcinogenesis, inducing genetic instability as a first step in the carcinogenic process. PMS2 mutations are common in Lynch syndrome and a recent publication showed a slightly increased risk of developing endometrial cancer in women with PMS2 mutations [49]. Our results suggest that this PMS2 variant could possibly be a marker for cervical cancer. However, larger studies are needed to further investigate this.

Whereas this proof of concept study was performed on surgically obtained tissues, we show here that targeted RNA sequencing also works for cervical scrapes, even scrapes that have been stored dry or in Preservcyt fixative at room temperature for up to a week. This is an important finding as it illustrates that targeted RNA sequencing may allow risk assessment of women for developing gynecological cancers based on noninvasive analysis of scrapes. Including probes to detect somatic mutations such as recently identified [25] may further increase the value of such risk assessment. These studies are now ongoing.