Development and external validation of nomograms in oropharyngeal cancer patients with known HPV-DNA status: a European Multicentre Study (OroGrams)

Background The proxy marker for human papillomavirus (HPV), p16, is included in the new AJCC 8th/UICC 8th staging system, but due to incongruence between p16 status and HPV infection, single biomarker evaluation could lead to misallocation of patients. We established nomograms for overall survival (OS) and progression-free survival (PFS) in patients with oropharyngeal squamous cell carcinoma (OPSCC) and known HPV-DNA and p16 status, and validated the models in cohorts from high- and low-prevalent HPV countries. Methods Consecutive OPSCC patients treated in Denmark, 2000–2014 formed the development cohort. The validation cohorts were from Sweden, Germany, and the United Kingdom. We developed nomograms by applying a backward-selection procedure for selection of variables, and assessed model performance. Results In the development cohort, 1313 patients, and in the validation cohorts, 344 German, 503 Swedish and 463 British patients were included. For the OS nomogram, age, gender, combined HPV-DNA and p16 status, smoking, T-, N-, and M-status and UICC-8 staging were selected, and for the PFS nomogram the same variables except UICC-8 staging. The nomograms performed well in discrimination and calibration. Conclusions Our nomograms are reliable prognostic methods in patients with OPSCC. Combining HPV DNA and p16 is essential for correct prognostication. The nomograms are available at www.orograms.org.


INTRODUCTION
In most parts of the Western world, the main risk factor for oropharyngeal squamous cell carcinoma (OPSCC) is now infection with high-risk human papillomavirus (HPV); while, a smaller proportion is related to a high consumption of alcohol and smoking tobacco. [1][2][3][4] Patients with HPV-associated OPSCCs have improved survival probably related to a different mutational profile, 5,6 immune response [7][8][9] and clinical features. 10 p16 overexpression is a proxy marker for HPV-driven carcinogenesis which is the main prognostic factor in patients with OPSCC. Consequently, p16 was included in the newly proposed American Joint Committee on Cancer/Union for International Cancer Control (AJCC-8/UICC-8) staging system. However, an estimated 10-20% of all OPSCCs are p16-positive, but HPV-negative, due to alternative cellular events leading to p16 overexpression 11,12 being most apparent in oropharyngeal nontonsillar, non-base of tongue cancer. 13 Hence, it may be suboptimal to stratify patients based on evaluation of a single biomarker (i.e. p16 alone) due to the risk of misclassification of tumours and thereby misallocation of patients with an undesired prognosis. 14,15 The combination of HPV-DNA and p16 status has shown better prognostication. 16 Available nomograms so far for patients with OPSCC do not include combined HPV-DNA and p16 status, and models have not been externally validated across areas with high and low HPV prevalence. 17,18 A nomogram is a graphical illustration of a statistical model for calculating the cumulative effect of several variables on a particular outcome, and nomograms have been developed to predict clinical endpoints for patients with several types of malignancies. In this study, we aimed to identify OPSCC-and patient-related factors associated with OS and PFS, and to construct and externally validate predictive nomograms. Moreover, this is the first study addressing patients treated for an OPSCC encompassing high and low HPV-prevalent countries in validation cohorts, and incorporating the newly published AJCC-8/ UICC-8 staging system refining prognostication by employing both HPV-DNA and p16 status.

MATERIALS AND METHODS
Patient cohorts and determination of p16 overexpression and presence of HPV DNA The development cohort. Consecutive patients diagnosed with OPSCC and treated with curative intent in eastern Denmark between 2000 and 2014 were included in the development cohort. 1,19,20 Using the unique resident code from the Danish Civil Registration System, we linked the Danish Head and Neck Cancer Group (DAHANCA) 21 database and the Danish Pathology Data Registry (DPDR), 22 to identify patients. Patient characteristics were retrieved from these databases as well as from medical records. Curative radiotherapy regimens consisted of 66-68 Gy, divided into 33-34 fractions given 6 days a week. From 2007, stage III-IV (UICC 7th) patients were offered concurrent chemotherapy (primarily weekly cisplatin 40 mg/sqm), if tolerated, whilst a minority were treated with cetuximab.
An expert head-and-neck pathologist re-validated a haematoxylin-eosin (H&E)-stained section of each tumour. p16 staining was considered positive if there was a strong and diffuse nuclear and cytoplasmic reaction in more than 70% of the tumour cells. 23 Immunohistochemistry for p16 was performed using the Ventana Benchmark Ultra autostainer with the UltraView detection kit and the p16 monoclonal antibody E6H4 ready-to-use with CC1 as a pretreatment (Roche, Tuscon, USA).
DNA was isolated from two to four 10-μm sections using the DSP DNA Mini Kit and the QIAsymphony SP kit (Qiagen, Hilden, Germany), according to the manufacturers' instructions. HPV DNA PCR was performed using the general primers GP5+/6+ and Platinum Taq DNA polymerase (Invitrogen, Naerum, Denmark). All GP5+/6+ PCR negative samples were subject to a GAPDH (housekeeping gene) PCR to confirm DNA quality. HPV DNA amplicons were run on the QIAxcel Advanced System using the QX DNA Screening Gel (Qiagen, Hilden, Germany) according to the manufacturer's instructions. The expected amplicon sizes werẽ 150 base pairs (bp) for GP5+/6+ and 200 bp for GAPDH. Negative samples were resolved on a 2.5% agarose gel stained with ethidium bromide to compare the sensitivity of the current assay with the standard. Approximately 50% of all samples were analysed by HPV PCR and the HPV+ cases were subsequently sequenced for HPV typing. 1 The remaining samples were analysed by HPV PCR. 19,20 The GP5+/GP6+ primers used 24 are known to amplify at least 37 mucosal HPV types, 25  The German cohort was classified using p16 immunohistochemistry (CINtec histology, Roche mtm laboratories) and highrisk HPV-DNA detection by PCR followed by bead-based hybridisation (Luminex Technology, Multimetrix, Progen, Heidelberg, Germany). 26 This assay detects the following 24 HPV types: 16,18,31,33,35,39,45,51, 52, 56, 58, 59, 68, 73 and 82, three putative high-risk types (26, 53 and 66) and six low-risk types (6, 11, 42, 43, 44 and 70).
The UK cohort was classified using p16 immunohistochemistry (CINtec histology, Roche mtm laboratories) and high-risk HPV DNA in situ hybridisation (Inform HPV III, Ventana Medical Systems Inc.). 3  Overexpression of p16 (>70% positive staining) was classified similar to the Danish cohort for all three validation cohorts.
Treatment modalities for the validation cohorts are presented in more detail in Table 1.

Statistics
Covariates available for adjustment are described in Table 1. Age was included as a continuous variable in the analyses, and the remaining variables were included as categorical variables. Overall survival (OS) was defined as the time from diagnosis of OPSCC to death from any cause. Progression was based on a biopsy or relevant imaging and progression-free survival (PFS) was defined as the time from diagnosis of OPSCC to time of progression at any site or death from any cause. Patients were censored at the last date of follow-up, or administratively censored 5 years after diagnosis. Kaplan-Meier curves were used to illustrate survival differences and significant differences were assessed with log-rank tests.
To evaluate which covariates influenced survival, we fitted multivariate Cox regression analyses with all factors in Table 1 except treatment included (full model), and ties were handled with the method suggested by Efron. Subsequently, we simplified the full models using a stepwise backward-elimination procedure  with Akaike's information criteria as stopping criteria (final model) using the R package rms and the function fastbw. 27 All models are multivariable, i.e. factors are mutually adjusted, and thus the effect estimates cannot be interpreted marginally. In a subanalysis, we evaluated the effect of fitting a spline for age in the development model. This sub-analysis was for the nonlinear part of the spline considered as non-significant; e.g. OS (p = 0.92) and PFS (p = 0.85).
To test whether the assumption of proportional hazards was violated, we tested for trends in the scaled Schoenfeld residuals of the final models. 28 None of the final models violated the proportional hazards assumption. Based on the final models, nomograms were constructed to predict overall survival and PFS at 1, 3 and 5 years after diagnosis. We considered only complete cases (e.g. patients were excluded from the analysis in the case of missing information from one or more variables). p values less than 5% were considered significant and all analyses were performed in R version 3.0. 3. 29 Validation and calibration of multivariate Cox regression models We conducted external validation by applying our nomograms to the patient cohorts from Sweden, United Kingdom and Germany. We assessed nomogram model performance by examining overall accuracy (Brier score), 30 calibration plots 31 and discrimination (Harrell's C index). 32 In addition, we fitted a Weibull calibration model, as suggested by van Houwelingen and Putter, in which shifts in baseline cumulative hazard (obtained from the final Cox models), the effect of the prognostic index (the linear predictor in the Cox model) and the shape of the cumulative baseline hazard were tested. 33 A smoothed version of the cumulative baseline hazard was used in the calibration model, where smoothing was done by linear interpolation.
The backwards elimination procedure left the model for overall survival unchanged, i.e. the model included age, gender, combined HPV and p16 status, smoking, T-, N-, and Mclassification and UICC-8 staging ( Table 3). The OS nomogram   was used to predict the probability of death due to any cause at 1, 3 and 5 years after diagnosis (Fig. 2). Figure 3 shows the calibration plots for internal and external validation at 1, 3 and 5 years. The calibration model showed that the log cumulative baseline hazard was shifted by −0.51 (95% CI:  Fig. S1 and histograms of the linear predictor plots are shown in Suppl. Fig. S2. Progression-free survival In total, 540 (41%) patients in the development cohort experienced disease progression or death, with 187 (24%) patients in the HPV+/p16+ subgroup vs. 274 (66%) patients in the HPV-/p16subgroup (p < 0.001). Crude cumulative incidence of progression or death in the development cohort was at 5 years, 28% (95% CI 25-32%) for the HPV+/p16+ patient group and 71% (95% CI 66-75%) for the HPV-/p16-patient group (Fig. 4). In the validation cohorts, 208 (61%) of 344 patients in the German, 162 (32%) of 503 in the Swedish and 158 (34%) of 463 in the UK cohort developed disease progression or death. Follow-up times are given in Suppl. Table S1. In the multivariable model, the AIC backward-selection procedure led to exclusion of the variable UICC-8 staging and inclusion of the covariates age, gender, combined HPV and p16 status, smoking, T-, N-and M-classification ( Table 4).
The nomogram for prediction of PFS at 1, 3 and 5 years is shown in Fig. 5 The specification of the model for all cohorts appeared correct with confidence intervals all including unity. Calibration plots for internal and external validation of PFS are shown in Fig. 6, Brier plots in Suppl. Fig. S3 and histograms of the linear predictor plots in Suppl. Fig. S4.

DISCUSSION
This study presents multinational-validated nomograms for OS and PFS for patients with OPSCC. One of the main findings includes the identification of combined HPV-DNA and p16 status as an important and independent predictor for OS and PFS. The nomograms performed well in external validation across areas with high and low HPV prevalence. These models may facilitate discussions in clinical settings and aid in identifying lower-risk patients who could be candidates for de-escalation therapy, as well as higher-risk patients eligible for treatment-escalation trials. The online nomogram (www.orograms.org) can be used for more precise calculations than drawing lines on the nomogram. The significance of the double biomarker can be exemplified in a typical patient case of a male, 60 years of age, nonsmoker, and classified as T2N2M0, UICC-8 stage II. If the tumour is HPV+/p16+, the 3-and 5-year OS estimates are 90% and 84%, respectively. However, if the tumour is HPV-/p16+, the 3-and 5-year OS estimates fall to 72% and 60%, respectively. Similar reductions are seen in PFS estimates when comparing HPV+/p16+ with HPV-/ p16+ tumours. Although these numbers are estimates, they underline the importance of evaluating patient prognosis using the combined biomarker of HPV and p16.
Notably, HPV+/p16+ patients with T1-T2 and N1 tumours could be considered candidates for de-escalation therapy, as their survival is similar to the background population, 34 and this might avoid some of the morbidity associated with therapy. Our models   also encourage studies to better understand whether HPV+/p16+ patients with N2 and N3 tumours are eligible for de-escalation as well. Notably, at least nine de-escalation treatment trials are ongoing or finishing. 35 Our nomograms are likely to be applicable to these and future trials, as we report similar 5-year survival or progression rates as in North America, [36][37][38] Western, [39][40][41] Southern, 42 and Northern Europe, 2,43 Australia 44 and China. 45 One of the strengths in this study is the joint use of HPV and p16 for scoring tumours. Other advantages are the large sizes of the development and validation cohorts, all from areas with universal, tax-financed health-care systems diminishing selection bias. In a previous smaller study in a region with very low HPV prevalence (<20%), the development cohort was not from a population-based, non-selected setting when constructing nomograms for OS and PFS in OPSCC patients. 17 A recently published nomogram from the United States, also with smaller cohorts, mainly included patients from private hospitals, and did not include the important double biomarker of HPV and p16. 18 This study also had difficulties in showing the significance of p16 alone for, e.g. PFS.
In this study, we chose overall survival instead of diseasespecific survival as a primary endpoint because it represents the cumulative effect of competing diseases, treatment morbidity and age on patient survival. As disease progression is associated with significantly poorer outcome and consequently a decrease in quality of life, we developed a nomogram with PFS as the endpoint. The PFS nomogram therefore complements the overall survival nomogram well.
Although our training cohort is population based and selection bias is minimised, the nomograms have limitations. With respect to accuracy, the CIs at the various predicted probabilities of recurrence should be considered if using these nomograms in clinical settings. The final models performed well in calibration and discrimination, but the level (risk of outcomes) is--as expected--not identical across cohorts. This is most evident in the German cohort, and this risk should be taken into account when using the models. The German cohort might perform worse due to several factors; partly a significantly lower HPV prevalence, higher smoking, higher share of patients who experience progression and a greater share of patients in advanced stage (e.g. stage 3 and 4). Although this is adjusted for, it should be considered whether these nomograms are best suitable in HPVhigh-risk areas. A possible other bias in this model is the number of censored patients, as observed in the crude-survival analysis. The Danish, UK and Swedish centres have~60-70% censorship opposed to the German with merely 45% censored patients.
These nomograms are only applicable for patients who underwent evaluation at multidisciplinary head and neck cancer centres, as performance of the nomograms is likely to be worse for patients who do not attend multidisciplinary evaluation.
p16 overexpression is a standard surrogate biomarker for highrisk HPV-associated OPSCC. 23,46,47 But notably, p16 overexpression is also present in a number of non-HPV-driven tumours probably related to RAS and BRAF mutations 48 although not in KRAS. 49 Other head and neck carcinomas have proven to be HPV-/p16+ likely related to misconfigurations in the p16-Rb-cyclin-D1 pathway inducing cell cycle activation in HPV-negative carcinomas. 50 The prevalence of HPV-associated OPSCC ultimately depends on the sensitivity and specificity detection method employed, and using different methods between cohorts as in this study, might lead to discordant results in HPV/p16 testing. All four centres employed different p16 and HPV-testing tools, which is a potential shortcoming. All methods cover the most relevant HR-HPV types (HPV 16,18,31,33,35,39,45,51, 52, 56, 58, 59 and 66). However, the Ventana in situ method (UK cohort) is highly sensitive but less specific opposed to amplification of HPV DNA by general primers (GP5+/6+) with presumed high sensitivity and specificity (Danish, Swedish and German cohort), and the subsequent detection of the PCR products with typespecific probes, e.g. bead-based multiplex, might differ. A limitation of this study is also the use of different p16 antibodies across centres, potentially leading to a discrepancy in p16 positivity. Notably, more than 90% of all tumours in this study are examined with the use of the same antibody (clone E6H4). Preferably, a subset of tumours should be tested with all methods to uncover potential shortcomings.
In conclusion, we developed and validated nomograms for OPSCC patients and known HPV-DNA and p16 status. The nomograms are applicable for both high and low HPV areas. Combining HPV-DNA and p16 status is essential for accurate prognostication. Future work might focus on validating our results and incorporating additional prognostic factors, including nomograms specific for salvage treatment for relapsed disease, as well as including outcome measures which have shown to influence outcome (i.e. weight loss, education and anaemia) and outcomes such as histopathological evaluations.