ERCC2 gene single-nucleotide polymorphism as a prognostic factor for locally advanced head and neck carcinomas after definitive cisplatin-based radiochemotherapy

Identifying patients with locally advanced head and neck carcinoma on high risk of recurrence after definitive concurrent radiochemotherapy is of key importance for the selection for consolidation therapy and for individualized treatment intensification. In this multicenter study we analyzed recurrence-associated single-nucleotide polymorphisms (SNPs) in DNA repair genes in tumor DNA from 132 patients with locally advanced head and neck carcinoma (LadHnSCC). Patients were treated with definitive radiotherapy and simultaneous cisplatin-based chemotherapy at six partner sites of the German Cancer Consortium (DKTK) Radiation Oncology Group from 2005 to 2011. For validation, a group of 20 patients was available. Score selection method using proportional hazard analysis and leave-one-out cross-validation were performed to identify markers associated with outcome. The SNPs rs1799793 and rs13181 were associated with survival and the same SNPs and in addition rs17655 with freedom from loco-regional relapse (ffLRR) in the trainings datasets from all patients. The homozygote major rs1799793 genotype at the ERCC2 gene was associated with better (Hazard ratio (HR): 0.418 (0.234–0.744), p = 0.003) and the homozygote minor rs13181 genotype at ERCC2 with worse survival (HR: 2.074, 95% CI (1.177–3.658), p = 0.017) in comparison to the other genotypes. At the ffLRR endpoint, rs1799793 and rs13181 had comparable prognostic value. The rs1799793 and rs13181 genotypes passed the leave-one-out cross-validation procedure and associated with survival and ffLRR in patients with LadHnSCC treated with definitive radiochemotherapy. While findings were confirmed in a small validation dataset, further validation is underway within a prospective biomarker study of the DKTK.


Introduction
Definitive cisplatin-based radiochemotherapy of locally advanced squamous cell carcinoma of the head and neck achieves 5-year survival rates of about 30-50% in patients with HPV-negative tumors, treated in prospective trials [1,2]. At such event rates radiation dose-response relationship is often the steepest and correlates positively with higher radiation doses. At higher or lower event rates, larger samples are needed to precisely determine development of classifier for progression free and overall survival [3]. Cisplatin-based radiochemotherapy is one of the standard treatment approaches in locally advanced head and neck carcinoma [4]. Nuclear excision repair pathways are the main mechanism to repair cisplatin-DNA adducts [5] and also mitomycin C induced DNA interstrand cross-links [6]. Single-nucleotide polymorphism (SNP) in nuclear excision repair as well as single or double strand break repair genes have been observed in several retrospective analyses being associated with a prognostic outcome of head and neck cancer patients treated with radiotherapy (RTX) or radiochemotherapy (RT/CTX) at the clinical endpoints for normal tissue toxicity, tumor response, or survival [7][8][9][10][11][12][13]. However, none of these SNPs plays any role in clinical routine for treatment selection or prognosis prediction so far.
A multicentre retrospective biomarker study on patients with locally advanced squamous cell head and neck carcinoma treated with definitive RT/CTX was initiated by the partners of the Radiation Oncology Group of the German Cancer Consortium (DKTK-ROG) with a purpose to establish prognostic and/or predictive biomarkers [14,15]. In the present study, the prognostic value of SNPs in repair proteins relevant for the effectiveness of the combined cisplatin and radiation therapy was analyzed on a cohort of patients homogeneously treated with definitive radiotherapy and concurrent chemotherapy. A separate group of patients receiving cisplatin-based induction chemotherapy in addition to concurrent radiochemotherapy was used for validation.

Study population and treatment
Patients of the DKTK-ROG biomarker study with locoregionally advanced head and neck squamous cell carcinoma of the oral cavity, oro-and hypopharynx, who were treated with definitive radiotherapy and simultaneous chemotherapy from 2005 to 2011 at six partner sites, were eligible for the present study. This study included 158 patients. The characteristics of the patients were previously described [14]. The following clinical factors describing the extent of disease and general criteria of each patient were obtained before definitive radiochemotherapy: age, gender, lymph node category, p16 expression, tumor site, and the logarithm of the combined gross tumor volume of the primary tumor and involved lymph nodes log(GTV total ) [14]. Patients without available genomic DNA for translational research centrally prepared in Dresden from formalin-fixed paraffin-embedded (FFPE) specimens of the primary tumor, had to be excluded (n = 23). In addition, patients without GTV measurements as a major prognostic clinical factor (n = 1) or by missing or unequivocal genotype data from SNP analysis (n = 2, sample call rate = 98.5%) were excluded. Patients documented by the DKTK-ROG, who received induction cisplatin-based chemotherapy and definitive radiotherapy with concurrent cisplatin-based chemotherapy during the same time period were eligible as a validation group. Ethical approvals for retrospective analyses of the clinical and biological data were granted by the ethics committees of all DKTK partner sites.
TaqMan allele discrimination assays were run on the ABI 7700 Sequence Detection System (Applied Biosystems, Rotkreuz, Switzerland) to determine the genotypes which use the TaqMan 5′-nuclease chemistry to amplify and detect specific polymorphisms in purified genomic DNA samples. Each assay enabled genotyping of individuals for an SNP and consists of two sequence-specific primers as well as two TaqMan minor groove binder probes with nonfluorescent quenchers. The probes are labeled with VIC and FAM dyes to detect the Allele 1 and Allele 2 sequences, respectively. Genotyping of SNPs was performed 2-3 times (n) for each SNP using the TaqMan allelic discrimination assays (rs50871, C_958480_10 (n = 2); rs1799793, C_3145050 (n = 2); rs13181, C_3145033_10 (n = 2); rs17655, C_1891743_10 (n = 3); rs11615, C_2532959_1 (n = 2); rs2267437, C_15872242_20 (n = 3); rs4988023, C_33307846_10 (n = 2); rs25487, C_622564_10 (n = 2)) all from Thermo Fisher Scientific, USA, on an ABI Prism 7900 HT Sequence Detection System (Applied Biosystems). The following cycling conditions were used: 10 min at 95°C, 45 cycles of 95°C for 15 sec, and 60°C for 1 min. About 5 ng of each genomic DNA were utilized per polymerase chain reaction in a volume of 5 μl. The analysis was done by using the SDS2.2 software package from Applied Biosystems. The full prognostic genotype information at each SNP locus was classified by two dummy variables rsSNP-1 and rsSNP-2. The rsSNP-1 contrasts the homozygote major allele phenotype against the heterozygote or homozygote minor alternatives, while the rsSNP-2 contrasts the homozygote minor genotype against the two other genotypes. This genetic model free approach also recommended [21] was selected, because there was not enough a priori evidence available to justify a specific genetic model for a given SNP.

Outcome definition
The first endpoint of this study was overall survival and the second endpoint freedom from loco-regional relapse (ffLRR). Survival time and time to loco-regional relapse were determined as time point from start of radiotherapy till time of death or loco-regional recurrence, or last follow-up.

Statistical analysis
Classifier building and leave-one-out cross-validation A prognostic six-parameter classifier was built from the genotypes of eight SNPs in seven candidate genes associated with base excision, nucleotide excision, and DNA double strand break repair along with standard clinical covariates. A score selection method for proportional hazard regression with leave-one-out cross-validation (LOO-CV) was used to assure internal validity [22,23]. The sixparameter model with the highest score χ 2 statistic prevalent in more than 60% of the leave-one-out training datasets was further evaluated. The LOO-CV approach was performed using the SAS macro described by Rushing et al. [24]. The PHREG and LIFETEST procedures were used from SAS version 9.4 [22]. Model selection and classifier calibration was performed on a training dataset, leaving each time the i-th patient out. The i-th patient was than classified as high or low risk depending on its predictive risk score according to the classifier from the training dataset. This procedure was repeated for each patient. The maximum number of parameters included into the classifier built on the training dataset was limited to six variables. The procedure PHREG selects under the score option the subset of six variables with the highest likelihood score statistic. If less than six parameters were selected in more than 60% of the training datasets, the number of covariates included into the classifier was reduced to the number of covariates fulfilling the former criterion. Patients with covariates leading to a higher than the median hazard in the training dataset, were classified as high risk. This procedure was repeated for patient i = 1 to N (N = number of patients in this study). Kaplan-Meier survivor functions in the high and low-risk groups were compared with a log-rank test, the Wilcoxon test. In addition, survival and loco-regional recurrence analysis was performed using the proportional hazard analysis. The validity of the proportional hazards assumption was assessed by a Kolmogorov-type supremum test (procedure PHREG from SAS).
The Hardy-Weinberg equilibrium of the alleles at an SNP position was analysed using a goodness-of-fit χ 2 test [23].
Linkage disequilibrium between genotype distributions of two SNPs was characterized by the Pearson's correlation coefficient from a 3 × 3 contingency table [25]. The strength of the correlation between genotypes was tested by Fisher's exact test (procedure FREQ, SAS).

Results
Characteristics of the eligible patients from the DKTK-ROG biomarker study involving a total of 158 patients with complete clinical data for multivariable analyses and DNA available for SNP-genotyping are shown in Table 1a. A set of 132 patients formed the basis of the present study (exclusion due to unavailability of DNA biomaterial). In eight patients without p16 immunohistochemical staining results, the average prognostic p16-effect was considered by a p16-dummy variable. The total radiation doses applied ranged from 68.4 to 74.0 Gy, median 72.0 Gy. Endpoints for this biomarker study were 5-year-overall survival as the first and freedom from loco-regional progression as the All numbers represent patient counts, except the rows headed by age and GTV total . The 7th edition of the UICC classification was applied.
GTV total combined gross tumor volume of the primary tumor and positive lymph nodes.
ERCC2 gene single-nucleotide polymorphism as a prognostic factor for locally advanced head and neck. . . second endpoint. A total of 65 patients died during followup and 53 had a documented loco-regional recurrence. A group of 20 patients was separately documented by the DKTK-ROG. Patients who had received cisplatin-based induction chemotherapy followed by concurrent cisplatinbased definitive radiochemotherapy for squamous cell head and neck carcinoma during the same time period were available as a validation group for the prognostic survival risk model. Their characteristics are shown in Table 1b. The total radiation doses applied ranged from 70 to 72 Gy in this group of patients. Eleven patients died during follow-up, while only five experienced a loco-regional recurrence. This event number was too low for validation of the locoregional relapse endpoint. The allele frequencies of the evaluated candidate SNPs are shown in Table 2 for the 132 patients from the DKTK-ROG biomarker study. No deviations from Hardy-Weinberg equilibrium were observed for all eight SNPs ( Table 2). The genotype of rs13181 was correlated with rs1799793. The Pearson's correlation coefficient was 0.68 for 95% confidence interval (95% CI: 0.59-0.79), p < 0.0001, Fisher's exact test. This genotype correlation based on the correlation of the rs1799793-1 and rs13181-1 contrast variables (r Pearson = 0.70), while rs1799793-1 and rs13181-2 showed only a slight correlation (r Pearson = 0.29). Homozygote major alleles at both loci were observed in 40 patients, heterozygote alleles in 46, and homozygote minor at both loci in 12 patients. There were no notable correlations between the alleles of the other pairs of SNPs (with absolute values of r s < 0.20) except for the pair rs11615 and rs1799793 (r Pearson = 0.36 (95% CI: 0.23-0.53), p < 0.0001, Fisher's exact test).
For the identification of clinical or SNP markers associated with survival or loco-regional recurrences, a score selection method for proportional hazard regression with LOO-CV was used [22,23]. The identified six markers for overall survival and freedom from loco-regional recurrence are shown in Table 3. The Fig. 1 shows the cross-validated Kaplan-Meier survival curves according to this sixparameter classifier, that was highly predictive (p < 0.0005, log-rank test).
The six identified markers were analyzed in detail using univariate proportional hazard analysis. From the clinical covariates the logarithm of the total gross tumor volume und p16 were related to survival at a p value < 0.05. Two SNPs were associated with survival, rs1799793-1 (p = 0.0031) and rs13181-2 (p = 0.017). The hazard ratio (HR) for survival according to rs1799793 was 0.418 (95% CI: 0.234-0.744) comparing the major GG genotype with the pooled AA or GA genotypes (Fig. 2a). The rs13181 homozygote minor genotype CC was associated with an HR of 2.074 (95% CI: 1.177-3.658) for survival in comparison to AC as well as AA and identified a subgroup of 17% of patients with a worse prognosis. The survival curves according to rs13181-2 are depicted in Fig. 2b. Multivariable analysis using forward selection from the identified six markers revealed that log (GTV total ), rs1799793-1, and the type of concurrent chemotherapy, cisplatin vs. mitomycin C were simultaneously correlated with survival at p < 0.05 (Table 5).
The association of the SNP markers rs1799793-1 and rs13181-2 with survival was also analyzed in the validation dataset of 20 patients receiving induction chemotherapy. The rs13181-2 was associated with survival (p = 0.030, score χ 2 test) and the HR was 9.0 for the homozygote minor patients vs. the others. There was a trend toward longer survival in rs1799793 homozygote major patients vs. the others (HR: 0.378, p = 0.089, score χ 2 test). Because of the small sample size, this validation has to be regarded as preliminary. A prospective biomarker validation study is underway by the DKTK-ROG and has finished patient recruitment in 2018.
The six-parameter model with the highest score χ 2 statistic for association with freedom from loco-regional recurrence is shown in Table 3. The cross-validated freedom from locoregional recurrences curves of the high and low-risk groups are shown in Fig. 3. The p value for comparison of these curves was p = 0.062 using the log-rank test and p = 0.025 using the Wilcoxon test. As both curves do not show an increasing divergence with follow-up, deviations from the proportional hazards assumptions were suspected, but not detected by the Kolmogorov-Smirnov type supremum  [26]. The results of univariate proportional hazard analysis of all selected markers are shown in Table 4. At the freedom from loco-regional recurrence endpoint the log(GTV total ), rs1799793-1, and rs13181-2 were associated with a p value of <0.05. Freedom from locoregional recurrence curves according to rs1799793-1, and rs13181-2 are shown in Fig. 4a, b. Due to the correlation between rs1799793-1 and rs13181-2, multivariable analysis selected log(GTV total ), rs1799793-1, and rs17655-1 as independent prognostic factors by the forward method (Table 5).

Discussion
This retrospective multicentre study analyzed the predictive value of SNPs in genes associated with nucleotide excision repair, which is a major repair pathway for removal cisplatin-DNA or mitomycin C adducts like ERCC2, ERCC1 and ERCC5. In addition, SNPs on DNA single (XRCC1) and double strand break repair genes (XRCC6, ATM) were analyzed. Patients with locally advanced head and neck cancer were treated with definitive radiotherapy and concurrent chemotherapy. In this study, the homozygote major GG rs1799793 genotype was associated with improved and the homozygote minor CC rs13181 genotype with worse survival or ffLRR than other respective genotypes in patients with locally advanced oro-or hypopharyngeal or oral cavity carcinoma, treated with concurrent radiochemotherapy. The rs1799793 minor allele frequency with 39% in this study is similar to that in other European samples, as obtained from HaploReg v.4.1 data [27]. The observations made by Lopes-Aguiar [28] were heading in the same direction, using a Table 3 Markers identified by best six covariate subset score selection proportional hazard analysis and leave-one-out cross-validation.  Fig. 1 Survival-cross-validation: high and low-risk group. Crossvalidated survival curves in the high (red) and low-risk (blue) groups separated at the median prognostic index from the trainings datasets entering the SNP genotype data in addition to the clinical covariates into the model. There was a significant difference between curves (p = 0.0005, log-rank test).

Endpoint
dominant model for rs1799793 after concurrent definitive radiochemotherapy in a smaller group of patients. Farnebo et al. [7] also found worse survival in homozygote minor allele rs13181 patients after definitive radiotherapy for head and neck carcinoma using a recessive model. In stages I and II of head and neck cancer treated with radiotherapy alone, no significant prognostic value of rs1799793 or rs13181 on overall survival was found in two studies [2,17]. Other retrospective studies on the prognostic value of ERCC2 SNPs enrolled heterogeneously treated patients, including those treated with surgery [11,12] or did not use overall survival as an endpoint [11]. The study by Zhong et al. [12] on patients treated with surgery with or without postoperative radiotherapy [29] concluded that a prognostic effect of rs13181 might be therapy dependent [12].
Mechanisms which could explain a decreased effectiveness of cisplatin-based chemotherapy and radiotherapy in patients with rs13181 and rs1799793 variant-type tumor cells are: (1) stronger synchronization in the S-phase due to intensely induced p53 expression [30] during fractionated irradiation or (2) less chromosomal damage after X-rays in  Fig. 3 Freedom from loco-regional recurrence-cross-validation: high and low-risk group. Cross-validated freedom from loco-regional recurrence curves in the high (red) and low-risk group (blue) defined by SNP genotypes and clinical covariates. The log-rank and the Wilcoxon test for comparison of these curves resulted in p = 0.062 and p = 0.025, respectively. LOO leave-one-out, log(GTV total ) logarithm to the base e of the combined gross tumor volume of the primary tumor and positive lymph nodes, Type-concurrCTX type of concurrent chemotherapy given (cisplatin-based vs. mitomycin C-based), p16 p16 overexpression (positive vs. negative), p16-dummy-var dummy variable indicating whether p16 immunohistochemistry is available or missing.
minor type rs13181 cells [16]. The minor variant of rs1799793 is associated with reduced mRNA levels [31,32]. Moisan et al. found that reduced expression of ERCC2 RNA can lead to a G2/M block and thereby alter radiation sensitivity of cycling cells [33]. In addition, variants of the ERCC2 gene at codons 312 and 751 might alter the mutational spectrum of tumors in these patients and thereby modify the sensitivity toward radiochemotherapy [34]. In nonsmall-cell lung cancer lower response rates to palliative cisplatin-based chemotherapy were found in rs13181 [35] and rs1799793 minor variants [36] using a recessive model in concordance with the findings of the present study.
The DNA for genotyping of the SNPs analyzed in this trial was obtained from FFPE-tumor probes containing various amounts of normal and tumor tissues in contrast to peripheral blood lymphocytes in most other studies. However, a total of 99% concordance rate for SNP in ERCC2 genotyping from FFPE colorectal tumor material and peripheral blood was found in the study of Van Huis-Tanja [37]. In addition, somatic mutations in tumors in the ERCC2 gene never affected the rs13181 or rs1799793 site [38]. Therefore, the results from both sources of cells are likely to be in concordance with one another.
The internal validity of ERCC2 SNPs as prognostic factors was analyzed by LOO-CV. The external validity was analyzed using data from patients receiving induction chemotherapy and cisplatin-based concurrent radiochemotherapy and will be further analysed in a prospective multicentre trial of the DKTK-ROG [14,15].
The rs1799793 and rs13181 SNPs at ERCC2 had a high predictive value for overall survival and freedom from locoregional recurrence after definitive radiochemotherapy. Predictive tools are urgently needed for radiation dose escalation or further treatment intensification for high risk patients receiving cisplatin-based radiochemotherapy so long as long-term prognosis of these patients is below or about 50%. While this study is larger, and the group of selected patients more homogeneously treated in comparison to previous studies evaluating the interference of ERCC2 SNPs with outcome, further validation is warranted. A prospective biomarker study of the DKTK-ROG is underway for validation and enforcement of the clinical relevance of our findings.
Acknowledgements The authors gratefully acknowledge the excellent cooperation and want to thank all pathologists and their local tissue Freedom from loco-regional recurrence -SNP rs1799793 Freedom from loco-regional recurrence Freedom from loco-regional recurrence -SNP rs13181  Fig. 4 Freedom from loco-regional recurrence: SNP rs1799793 and SNP rs13181. a Freedom from loco-regional recurrence-SNP rs1799793. Freedom from loco-regional recurrence curves according to rs1799793 genotypes (p = 0.009, log-rank test). b Freedom from loco-regional recurrence-SNP rs13181. Freedom from loco-regional recurrence curves according to rs13181 genotypes (p = 0.03, log-rank test). Funding Open access funding provided by Projekt DEAL.
Author contributions MG: conceived and designed the analysis, collected data, contributed to data analysis tools, discussion of possible data analysis, performed the analysis, wrote the paper. AS: acquisition of data, or analysis and interpretation of data. CP: collected data, analysis and interpretation of data. Research Center (DKFZ, Heidelberg), he has been or is still responsible for collaborations with a multitude of companies and institutions, worldwide. In this capacity, he discussed potential projects with and has signed/signs contracts for his institute(s) and for the staff for research funding and/or collaborations with industry and academia, worldwide, including but not limited to pharmaceutical corporations like Bayer, Boehringer Ingelheim, Bosch, Roche and other corporations like Siemens, IBA, Varian, Elekta, Bruker and others. In this role, he was/is further responsible for commercial technology transfer activities of his institute(s), including the DKFZ-PSMA617 related patent portfolio [WO2015055318 (A1), ANTIGEN (PSMA)] and similar IP portfolios. He confirms that to the best of his knowledge none of the above funding sources was involved in the preparation of this paper. MS: Research grant was contributed by AstraZeneca in 2019 and 2020. MS confirms that the above mentioned funding source was not involved in the study design or materials used, neither in the collection, analysis, and interpretation of data nor in the writing of the paper. Consultant For the present study, MK confirms that none of the above mentioned funding sources were involved in the study design or materials used, nor in the collection, analysis and interpretation of data nor in the writing of the paper.
Ethical approval Statement on ethics approval and consent by the ethics committee of the Faculty of Medicine, University Duisburg-Essen, University Hospital Essen (Prof. Havers, Prof. Schara) is available: registration number 14-5816-BO.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.