Background

While human papillomavirus (HPV) is a necessary cause for cervical cancer, several authors have reported a proportion of invasive cervical cancers as being HPV negative. The range of HPV negativity varies between studies: in some cervical cancer cohorts, just 0.03% cervical tumours have no HPV detected while others report up to 15% cervical cancers being HPV negative.1,2,3

Possible explanations for not finding HPV among cervical cancers include: cancers independent of HPV (rare true negatives), loss of HPV genomes, cancers caused by HPV types not tested for, misclassification of cancers (e.g. metastasis from other tumours or cancers of corpus uteri misclassified as cervical) and failure of HPV detection methods.1,2,3,4

To optimise cervical screening, the risk of false negativity in HPV screening needs to be minimised at all stages. HPV-test negative cervical cancers may constitute a biologically distinct subgroup, associated with symptomatic detection, late-stage diagnosis and worse prognosis5 that may need different targeted therapeutic strategies.6

Common methods to detect HPV are based on PCR and probe hybridisation, usually targeting the L1 gene, which is the most conserved gene within the HPV genome.7 These PCR methods are both efficient and sensitive but are unable to detect HPVs that do not bind specifically to designed primers and probes. An HPV genotype that diverges in its genomic sequence from the sequences designed for primers/probe may escape amplification and/or hybridisation, and therefore remain undetected.8,9,10 HPV detection failure can be readily circumvented by subjecting specimens to unbiased high throughput sequencing of the total nucleic acids of the sample, as HPV can then be detected without prior knowledge/assumptions of which genotype-specific sequences may be present.8,11,12 Furthermore, if cDNA is sequenced, the data will show if there is viral transcriptional activity, which is typically essential for both initiation and maintenance of the malignant phenotype.

We thus aimed to assess whether unbiased deep sequencing might detect HPV among cervical cancer specimens testing HPV negative by PCR.

Methods

Sample collection

HPV genotyping was previously performed based on L1 amplification and consequent probe hybridisation on formalin-fixed paraffin-embedded (FFPE) blocks from all cervical cancers occurring during a 10-year period (2002–2011) in Sweden.13 In case of HPV negativity, the slide from each case-block was re-reviewed by a pathologist to confirm presence of tissue representative of cervical cancer. FFPE cervical blocks were thereafter subjected to real-time PCR (rt-PCR) for HPV 16 and 18 targeting E6 and E7 genes of the HPV genome. Out of the 2850 cervical cases, we reported that almost 14% (394/2850) were HPV negative by PCR.13

For this study, we included all HPV negative FFPE blocks (n = 394) together with 59 randomly selected HPV positive blocks to be used as positive controls, to assess the overall presence of HPV using an unbiased (i.e. not based on PCR) approach with deep sequencing. As negative controls, we selected all corresponding empty paraffin blocks, “blank blocks”, that had been sectioned in-between each cervical cancer FFPE block (n = 453).

A total of 34/59 HPV positive blocks and 92/394 cancers negative by HPV PCR had been already sequenced as part of a methods optimisation paper.14 To obtain a comprehensive result of the HPV status in invasive cervical cancers in Sweden during a 10-year period, the previous PCR data and the previous sequencing data were combined with the new sequencing performed here.

Deep sequencing

All FFPE blocks (n = 906) were extracted as previously described.15 Cervical FFPE blocks were divided randomly into 7 groups for sequencing. The blank blocks corresponding to the cases in each group, were divided and pooled into 2 negative controls. As a result, each group contained approximately 65 cervical blocks (HPV positive and negative cervical blocks) and 2 negative controls (containing each of these controls, a pool with extracted material from half of the empty paraffin blocks corresponding to the FFPE cases within that group).

For each sample, 8 μl of extracted material was reverse-transcribed, indexed and rRNA-depleted following the SMARTer® Stranded Total RNA-Seq Kit v2 - Pico Input Mammalian library preparation guide (Takara, US), omitting the fragmentation step. The libraries were validated, normalised to 2 nM and pooled according to the 7 groups division before sequencing. RNA seq was performed on NovaSeq 6000 system (Illumina, USA) at 2 × 150 bp, in seven different runs (one run per group), aiming for 30 M paired end reads/sample.

Bioinformatics

Indices, included in the Illumina adaptors, were used to assign raw sequence reads obtained from the NovaSeq 6000 (Illumina) platform to the originating samples. Reads were quality- and adaptor-trimmed with Trimmomatic16 using default parameters and 18 bp as minimal read length. The first 3 nucleotides from every R2 read were trimmed, as advised within the SMARTer pico kit used for library preparation, and thereafter high-quality paired reads were screened against the human reference genome GRCh38 using NextGenMap v.0.5.2.17 Reads were filtered out if they aligned with >95 % identity over 75 % of their length to the human genome. Non-human reads were queried against all HPV protein sequences included in the PaVE database (Papillomavirus Episteme, accessed on 28 July 2019, including all protein sequences from HPV reference and non-reference genomes), using the open source software Diamond18 blastx with default parameters and –top 1. Samples were considered positive for HPV (cut-off) if a minimum of 10 reads were detected for any HPV type with at least 90 % identity, and a coverage of >10% of that HPV genotype (approx. 790 bp) was present.

Analysis of the MGP region

Cervical samples classified as positive with deep sequencing which presented HPV genotypes that should have been but were not previously detected by PCR, were subjected to further analysis to elucidate reasons for the false negativity. We analysed the genomic sequence and coverage of HPV reads within the MGP region in the L1 gene, which is the PCR region targeted for genotyping using a modified general primer (MGP) system,19 as well as possible mismatches with primers and probes used in the genotyping method.

Non-human reads were aligned to an HPV database comprising all HPV genomes officially established by the International HPV Reference Center (n = 221), https://www.hpvcenter.se, accessed on 2020–01–20), together with complete genome sequences from HPV types that are not officially established yet (n = 222, https://pave.niaid.nih.gov, accessed on 2020–01–20), using NextGenMap,17 and >90% identity over 75% of read length as selected parameters for alignment. HPV-aligned reads from each sample were thereafter subjected to visual inspection using Integrative Genomics Viewer to confirm mapping of reads within the corresponding genotype, assess coverage and SNPs existence.

Confirmation of presence of HPV types by rt-PCR

Rt-PCR was used to confirm presence/absence of HPV types in case of discrepancies between PCR-based genotyping and deep sequencing or when a blank block had tested positive. In the cancer cases where this happened, all cervical tumour samples, together with all individual empty paraffin blank blocks within the sequencing run, were subjected to rt-PCR. The rt-PCR was performed at 20 µL reaction volume containing 50–100 ng genomic DNA, 1× TaqMan Universal PCR Master Mix (Applied Biosystems) including 0.5 µM of each forward and reverse primer and 0.25 µM HPV MGB-probe. Water was used as non-template control in each run. The PCR reaction was carried out in ABI 7300 Real-time PCR System, using the Software version 2.0.5 (Applied Biosystems), with the following temperature settings: 10 min at 95 °C, followed by 45 cycles at 95 °C for 15 s and 60 °C for 1 min. The threshold was set to 0.1 ∆Rn (Rn is the fluorescence of the reporter dye divided by the fluorescence of the passive reference. For ∆Rn the baseline fluorescence has been subtracted).

When there was a disagreement in genotyping results between the PCR based method and sequencing, the rt-PCR mixtures contained a total of 20 µL: 5 µL sample and 1× of TaqMan Fast Advanced Master Mix (Applied Biosystems) including 0.5 µM of each primer and 0.25 µM HPV probe. Water was used as non-template control in each run. The PCR analyses were carried out in QuantStudio 3, using the QuantStudio Design & Analysis Software v.1.4.3 (Applied Biosystems), with the following temperature settings: 20 min at 95 °C, followed by 45 cycles at 95 °C for 1 min and 60 °C for 20 s.

Results

Among all the cases of invasive cervical cancer in Sweden over a 10-year period, we could perform HPV genotyping with PCR and/or deep sequencing for 2850 cases. All FFPE tumour blocks and their corresponding blank paraffin blocks, were first subjected to HPV genotyping. If HPV negative (n = 483), the FFPE blocks were then subjected to HPV 16 and 18 rt-PCR targeting E6 and E7 genes of the HPV genome. In case still being HPV negative (n = 394), the samples were then deep sequenced following an unbiased approach, not based on PCR. A consort diagram can be found in Supplementary Fig. 1.

The median age at cancer diagnosis of the HPV negative cervical cases (n = 394) was 62.5 years (range 24–95 years); most women were diagnosed at age 60 or above. Sixty-five percent of the HPV negative cervical cancers were squamous cell carcinoma, 23 % were adenocarcinoma and 12 % were less common histological types, such as adenosquamous cell carcinoma and other rare types. Compared to cases positive for HPV, HPV negative cases were more likely to be diagnosed at older age and more advanced stage (Table 1).

Table 1 Characteristics of women with a primary invasive cervical cancer diagnosis 2002–2011 in Sweden by HPV status.

There were two specimens that had been HPV-negative by PCR but, had too little material available for sequencing and thus only 392/394 HPV-negative cancers were sequenced. We also sequenced 59 HPV PCR-positive cervical cancers, selected at random, for comparability of methods.

The NovaSeq sequencing generated high-quality sequencing data, with a median of 2898 million paired reads per run and 30 million paired reads per sample. Most reads (median 96%) were classified as human sequences. We detected HPV sequences in 54/59 HPV PCR-positive cancers, with total concordance regarding the HPV type detected in 46/59 (77.97%) specimens (Table 2), partial concordance in 3/59 specimens (5.08%), and HPV positivity but with a different genotype than detected by PCR in 5/59 (8.47%) specimens (Table 2). Five specimens were HPV negative by sequencing.

Table 2 HPV types detected with RNAseq in PCR positive specimens.

HPV sequences were detected in 169/392 HPV PCR-negative specimens, with all samples presenting a single infection. HPV 33, 73, 31 and 45 were detected in 117/392 of all HPV PCR negative cervical cancer cases (58/392, 23/392, 19/392 and 17/392 specimens, respectively) (Table 3).

Table 3 HPV detection using RNAseq in HPV PCR negative specimens.

Combining the deep sequencing results from this paper with previously reported results13,14 to provide a comprehensive overview of the presence of specific HPV types in cervical cancers in Sweden found some type of HPV in 92.17% (2625/2848) of all cervical cancers cases (Table 4). There was a total of 30 different HPV types detected but only five types (HPV 16/18/45/33 or 31) were found in >3% of cancers (Table 4).

Table 4 HPV status in 2850 cervical cancer cases from a nationwide audit procedure, listed by type (single infections only), multiple, negative and not tested respectively, as defined by PCR, and/or RNA sequencing results.

Analysis of the MGP region

One hundred and sixty-four cervical cancers HPV-negative by PCR (164/169, 97.04%) together with 5/59 (8.47%) previously classified as positive with HPV PCR, presented HPV genotypes that should have been previously detected by the PCR-based method but were not. Analysis of the sequencing reads present in the MGP region in the L1 gene (the PCR region targeted for genotyping) revealed that 109/169 cancers had reads covering the MGP region, with 72/109 samples showing identical sequence to the corresponding HPV genotype. Samples presenting with sequences that differed from the PCR-targeted region (37/109) did not show more than 2-point substitutions when comparing the reference genotype sequence comprising the forward primer, probe or reverse primer binding regions. Most of them (23/37) had only a single substitution in one of the three binding regions (Table 5).

Table 5 MGP coverage in HPV PCR−/RNASeq+ specimens.

Confirmation of presence of HPV types by rt-PCR

We sequenced 14 pools of blank blocks which comprised all empty paraffin blocks that had been sectioned in-between each cervical cancer FFPE block. Twelve pools showed no traces of HPV, 1 pool revealed presence of HPV 85 (a genotype that had not been detected in any of the case specimens of the study) and 1 pool contained 51 reads of HPV 33. Among the 70 samples sequenced within the same run as the HPV 33 positive blank block, there were 16 HPV 33-positive samples.

We, therefore, subjected all 70 cervical tumours and all 70 blank blocks in this particular run to type-specific real-time PCR targeting HPV 33. Primers and probes for HPV 33 have been previously described.20 The Rt-PCR confirmed HPV 33 presence in all reported HPV 33 positive samples (16/16) and also found it in one cervical tumour where it had not been found by sequencing. All the individual empty paraffin blank blocks were negative for HPV 33 by rt-PCR.

In the analyses of the five samples where the PCR and the sequencing had found different genotypes, the rt-PCR found the same HPV genotype as found by the sequencing in all 5 cases.

Discussion

We find that unbiased (not based on methods requiring prior knowledge of the sequences aimed to be detected) deep sequencing of FFPE specimens of invasive cervical cancers provides improved accuracy and detectability, detecting HPV in 169/392 cancers that had been negative by HPV PCR.

Accurate detection of HPV is essential, as false negativity may translate into failure in the HPV-based cervical screening program. All specimens that were included in this study were confirmed by surgical staging, histopathological re-review, and specimen adequacy assessment (beta-globin detection) to rule out the possibility of HPV negativity being due to misclassification of the tumour and/or specimen inadequacy.13,21 While a few of the HPV types detected with deep sequencing (HPV 32, 34 and 38) were not tested for with the PCR detection method used, most of the samples (164/169) revealed HPV types, that should have been detected with the PCR based method. Further analysis of the MGP region (region targeted by PCR primers and probe) revealed that many of the PCR HPV negative cervical cancers (60/169) had no sequences present from the L1 region in the samples, which might explain why a PCR method targeting L1 failed to detect the HPV. Thirty-seven samples showed sequence variability in the sequences targeted by primers or probes, although with not more than 2 substitutions in either one primer and/or probe. For the remaining 72 cancers, we cannot explain why the HPV was not detected before. Our PCR method has a sensitivity of 50 international units of HPV 16 and HPV 18 and 500 genome equivalents for the other oncogenic HPV types,22 suggesting that the difference may be attributable to the higher sensitivity of the sequencing.

The majority of the HPV PCR negative samples that showed HPV presence after being sequenced contained high-risk or probable high-risk HPV types (161/169, 95.27%), and only 8/169 (4.73% specimens showed presence of low-risk HPV types 32 (n = 1), 34 (n = 3), 38 (n = 1), 42 (n = 2), and 70 (n = 1), in line with the fact that low-risk HPV types are only occasionally found in cervical cancer.

We report that there are still 223/2850 (7.82%) cervical cancers where HPV was not detected. A previous meta-analysis including 30,848 invasive cervical cancers worldwide reported an increased overall HPV prevalence from 85.9% in studies published from 1990 to 1999 to 92.9% in studies published from 2006 to 2010.4 Similar percentages of HPV-negativity in invasive cervical cancers are reported by other authors (7–11%).1,4,21

HPV negativity was diagnosed based on both PCR and next-generation sequencing, and not on immunohistochemistry (cytokeratin 5, pan cytokeratin, protein 63, P16 and P53). In this study, we had pre-specified a cut-off of 10 HPV reads and at least 10% HPV genome coverage for a sample to be considered as HPV positive, to avoid possible false positivity. There were some samples that presented a weak signal of HPV (<10 reads and/or <10% of HPV genome coverage according to our definition). Further studies are needed to elucidate the reproducibility and significance of these.

Factors as duration of tissue storage, type of cancer histology, and stage of cervical cancer are reported to correlate with the detectability of HPV in cervical tumours.23,24,25,26 The HPV prevalence was indeed very high (96.42%) in stage Ia stage, compared to 84.13% in III+ stage, in line with previous publications suggesting that HPV expression can be lost as the stage of the cervical cancer is progressing.23

One retrospective study reported that the duration of tissue storage had a statistically significant impact on HPV detection (p < 0.005), with cases from the last 30 years showing around 24% HPV negativity and older cases (stored for >60 years) presenting a significantly lower HPV detection rate, with >50% HPV negativity.25 Our study comprised tumour blocks obtained during a 10-year period and therefore, no differences are expected among the HPV detection rate due to the different storage time among them.

It is also known that HPV negativity is more common in several cervical cancer histologies. While squamous cancers and adenocarcinomas in situ are hardly ever HPV negative,26 adenosquamous tumours show an HPV prevalence of about 86%24 and the HPV prevalence among adenocarcinomas varies between the subtypes.25 The present study showed an HPV prevalence of 95.40% for squamous cell carcinomas, 84.28% for adenocarcinomas, 82.35% for adenosquamous carcinomas and 76.09% for other rare invasive cervical cancers, again in line with previous literature.

As reported previously, we chose to perform reverse transcription of mRNAs without DNAse treatment to obtain a maximally sensitive combination of cDNA and some genomic DNA as well.14 We detected reads with known HPV viral splice junctions in 81/169 (48%) of the HPV-positive cancer specimens (data not shown), showing that at least a part of these FFPE specimens had some viral mRNA present.14

Despite the fact that HPV negative cancers do exist, HPV screening is by far the best option to eliminate cervical cancer. Data from randomised controlled trials are very clear on the very low cervical cancer risk after an HPV-negative screening test.27 The studies in our paper are well in line with the findings that HPV screening has higher sensitivity than any other alternative screening method.

In summary, we report that when cervical cancers are tested by both PCR and deep sequencing, HPV sequences exist in >92% cervical cancer specimens. Type-specific PCRs targeting retained regions of the HPV genome of the most common HPV types might be useful for optimal sensitivity in HPV screening. In addition, we suggest that deep sequencing of apparently HPV-negative cervical cancers is useful for quality assurance in HPV screening and detection of HPV infections undetected by HPV PCR methods.