Potential surrogate endpoints for overall survival in locoregionally advanced nasopharyngeal carcinoma: an analysis of a phase III randomized trial

The gold standard endpoint in trials of locoregionally advanced nasopharyngeal carcinoma (NPC) is overall survival (OS). Using data from a phase III randomized trial, we evaluated whether progression-free survival (PFS), failure-free survival (FFS), distant failure-free survival (D-FFS) or locoregional failure-free survival (LR-FFS) could be reliable surrogate endpoints for OS. Between July 2002 and September 2005, 316 eligible patients with stage III-IVB NPC were randomly assigned to receive either radiotherapy alone or chemoradiotherapy. 2- and 3-year PFS, FFS, D-FFS, and LR-FFS were tested as surrogate endpoints for 5-year OS using Prentice’s four criteria. The Spearman’s rank correlation coefficient was calculated to assess the strength of the associations. After a median follow-up time of 5.8 years, 2- and 3-year D-FFS and LR-FFS were not significantly different between treatment arms, in rejection of Prentice’s second criterion. Being consistent with all Prentice’s criteria, 2- and 3-year PFS and FFS were valid surrogate endpoints for 5-year OS; the rank correlation coefficient was highest (0.84) between 3-year PFS and 5-year OS. In conclusion, PFS and FFS at 2 and 3 years may be candidate surrogate endpoints for OS at 5 years; 3-year PFS may be more appropriate for early assessment of long-term survival.

Scientific RepoRts | 5:12502 | DOi: 10.1038/srep12502 significantly better overall survival (OS) than the control arm. The CRT therapeutic regimen established in this trial is now considered a standard of care for locoregionally advanced NPC 6 .
OS is the gold standard endpoint in trials for NPC; 5-year OS is commonly used to assess the long-term benefits of a treatment. OS is easily and reliably measured and is easy to interpret. However, analysis of OS requires a large sample size and long follow-up duration to detect significant differences; changes in OS may also be diluted by non-cancer deaths and administration of subsequent treatments after progression.
Identification of surrogate endpoints for OS would shorten trial duration and thus speed up the identification of effective treatments for NPC. Ideally, such surrogates should be based on clinical information that is available as soon as possible after definitive therapy. In NPC trials, progression-free survival (PFS) and failure-free survival (FFS) represent potential surrogate endpoints; distant failure-free survival (D-FFS) and locoregional failure-free survival (LR-FFS) may also serve as valid surrogates. To the best of our knowledge, no study has yet validated these endpoints at an early time-point (e.g., 2 and 3 years) as surrogates for long-term survival (e.g., 5-year OS) in locoregionally advanced NPC. To establish an endpoint as a surrogate, a set of statistical conditions, known as Prentice's criteria, need to be met 7 . The purpose of this analysis was to evaluate whether 2-and 3-year PFS, FFS, D-FFS or LR-FFS (as defined below) have potential as surrogate endpoints for 5-year OS according to Prentice's criteria based on data from our phase III randomized trial.

Methods
Patients, treatment and follow-up. The data used in this analysis was based on the trial progress report 5 and was extracted from the trial database on February 25, 2011. The study was approved by the Institutional Review Board of Sun Yat-sen University Cancer Center, and was conducted in accordance with the Good Clinical Practice guidelines; all patients provided written informed consent before treatment.
The study cohort consisted of 316 eligible patients (158 patients randomly assigned to each group). Patients were eligible for the trial if they met the following inclusion criteria: (1) histologically-proven, non-keratinizing or undifferentiated NPC (World Health Organization type II or III); (2) stage III-IVB (T3-4NxM0 or TxN2-3M0) disease according to the 1997 American Joint Commission on Cancer staging system; (3) an Eastern Cooperative Oncology Group system performance status of 0-2; (4) adequate renal function, as demonstrated by a creatinine clearance rate of at least 60 mL/min; and (5) adequate hematologic function, as demonstrated by a leukocyte count ≥ 4,000/ μ L and platelet count ≥ 100,000/ μ L. Patients were excluded for any of the following exclusion criteria: keratinizing squamous cell carcinoma or adenocarcinoma; age ≥ 70 or ≤ 16 years; pregnancy or lactation; history of renal disease; unstable cardiac disease requiring treatment; a history of previous radiotherapy or chemotherapy; or a history of prior malignancy.
All enrolled patients received 2-dimensional RT (2 Gy fractions 5 five times per week). The cumulative doses were > 66 Gy to the gross tumor, 60 to 66 Gy to the involved areas of the neck and > 50 Gy to the uninvolved areas. For the CRT group, concurrent chemotherapy with weekly cisplatin (40 mg/m 2 on day 1) was given intravenously for 7 weeks during the RT phase. Three subsequent adjuvant chemotherapy cycles, consisting of cisplatin (80 mg/m 2 intravenously) on day 1 and fluorouracil (120 h intravenous infusion; 800 mg/m 2 daily) on days 1-5 were administered during weeks 5, 9 and 13 after the completion of RT. There were no dose modifications during concurrent chemotherapy. Adjuvant chemotherapy dose modifications were based on nadir blood counts and the interim toxicities of the preceding cycle. Further details are provided in the trial progress report 5 . Whenever possible, patients with documented relapse or persistent disease received salvage treatments (re-irradiation, chemotherapy, surgery) according to institutional guidelines.
Patients attended follow-up visits every 2 months in year 1, every 3 months in the subsequent 2 years, and then every 6 months thereafter (or until death). All local recurrences were diagnosed by fiberoptic endoscopy and biopsy or MRI/intensive CT of the nasopharynx and the skull base. Regional recurrences were diagnosed by clinical examination of the neck; irresolute cases were confirmed by fine needle aspiration or MRI/intensive CT scans. Distant metastases were diagnosed by clinical symptoms, physical examinations and imaging methods including chest X-ray, bone scan, MRI, CT and abdominal sonography. Study endpoints. The major trial endpoint was 5-year OS (time from randomization to death from any cause). The primary potential surrogate endpoints in this study were 2-and 3-year PFS and FFS; PFS was defined as the time from randomization to failure or death from any cause, FFS as the time from randomization to the first failure at any site. Additionally, we also examined secondary potential surrogate endpoints: 2-and 3-year D-FFS and LR-FFS, which were defined as time from randomization to locoregional or remote failure, respectively. Patients with no documented evidence of events were censored at the date of last follow-up; all events that occurred after 2 or 3 years were also censored.

Statistical analysis.
Prentice's four criteria 7 were used to assess the validity of the surrogates. A surrogate for a true endpoint should yield a valid test of the null hypothesis that no correlation exists between the treatment and the true response. The following four criteria were tested to validate Prentice's criteria: 1) the treatment is significantly prognostic for the true endpoint (i.e., 5-year OS); 2) the treatment is significantly prognostic for the surrogate endpoint (e.g., 2-and 3-year PFS, FFS, D-FFS or LR-FFS); 3) the surrogate endpoint is significantly prognostic for the true endpoint; and 4) the full effect of the treatment on the true endpoint should be captured by the surrogate endpoint. The incidence rates for the true and surrogate endpoints were estimated using the Kaplan-Meier method and compared using the log-rank test 8 . Adjusted Cox proportional hazard models were used to determine hazard ratios (HRs) for the two arms with respect to each survival outcome 9 . The following parameters were included as covariates for each analysis: age (> 50 vs.≤ 50 years), sex (male vs. female), T classification (T3-4 vs. T1-2), N classification (N1-3 vs. N0) and treatment arm (CCRT + AC vs. RT) for Prentice's criteria 1, 2 and 4, and surrogate endpoints (with vs. without events ≤ 2/3 years) for Prentice's criterion 3. Additionally, the Spearman rank correlation coefficient (ρ ) for the distributions of the candidate surrogate endpoints and 5-year OS at the individual level were calculated using a bivariate survival model 10 to assess the strength of the associations as ρ 2 reflects the amount of variation explained by the surrogate. SPSS 19.0 software (IBM, Armonk, NY, USA) was used for all analysis.

Results
Between July 2002 and September 2005, 316 eligible patients were randomly assigned to the treatment arms in the phase III trial (158 patients in each group). Median follow-up for the entire cohort was 5.8 years (range, 0.1-8.5 years). Pretreatment characteristics were well-balanced between the two arms ( after CRT compared to RT. Therefore, Prentice's first criterion, that the treatment is prognostic for the true endpoint, was met.  Table 1). However, treatment arm was not a significant prognostic factor for 2-and 3-year D-FFS or LR-FFS, though treatment arm had a marginally significant effect on these outcomes (Table 2). Therefore, Prentice's second criterion, that treatment is prognostic for the surrogate endpoint, was only met for 2-year and 3-year PFS and FFS.
Prentice's Criterion 3. In both the 2-year and 3-year evaluations, failure or death due to any cause, and failure at any site within ≤ 2/3 years had a significant impact on 5-year OS (all P Cox < 0.001; Table 3). As treatment arm was not prognostic for distant and locoregional failure within ≤ 2 or 3 years, Prentice's third criterion was not tested for these endpoints. Prentice's third criterion, that the surrogate endpoint is prognostic for the true endpoint, OS, was met for 2-and 3-year PFS and FFS.
Prentice's Criterion 4. To assess this criterion, the first hypothesis tested whether 5-year OS was independent of treatment in patients who suffer failure or die due to any cause within ≤ 2/3 years, or suffer failure at any site within ≤ 2/3 years. The second hypothesis tested whether 5-year OS was independent of treatment if none of the abovementioned events occurred within ≤ 2/3 years (Table 4). The effect of treatment on 5-year OS was not significant for either hypothesis (all P Cox > 0.10; Table 4). These findings suggest the full effect of treatment on 5-year OS can be explained by the surrogate endpoints PFS and FFS, independently of treatment; therefore, Prentice's fourth criterion was satisfied.

Discussion
This study is a secondary analysis of a prospective randomized phase III trial, in which the treatment effects of CRT and RT were compared in locoregionally advanced NPC. The purpose of this analysis was to apply Prentice's surrogacy criteria to evaluate which secondary endpoints could be useful surrogates for 5-year OS. Both PFS and FFS at 2 and 3 years were consistent with all four of Prentice's criteria, while D-FFS and LR-FFS were not significantly different between arms at 2 and 3 years. Moreover, compared According to a recent population-based study in China, NPC has a relatively longer natural history than other aggressive malignancies such as lung cancer, gastric cancer, liver cancer and esophageal cancer 11 . Five-year OS for locoregionally advanced NPC ranges from 58 to 75% 12 . Identification of surrogate endpoints could enable prediction of the effects of treatment on long-term survival at earlier time-points; in future trials, this could reduce the follow-up period and accelerate the development of therapeutic regimens. Surrogate endpoints measured at later time-points are less useful as they become closer to the final endpoint. However, a very short period of observation may lead to a loss of events and lead to difficulties in observing significant differences and inaccurate prediction of the true endpoint. In our analysis, both D-FFS and LR-FFS measured at 2 and 3 years did not obey Prentice's second criterion and are not suitable surrogate endpoints. Though several meta-analyses 6,13,14 have demonstrated CRT can effectively reduce distant and locoregional failures, these studies included many patients from a number of trials, which makes it relatively easier to obtain a positive result. In general, it is difficult for a trial to recruit so     Table 3. Surrogate endpoints as prognostic factors for 5-year overall survival a . HR, hazard ratio; CI, confidence interval. a As treatment was not prognostic for distant failure or locoregional failure, Prentice's criteria 3 and 4 were not tested for these endpoints. b The reference groups had no events ≤ 2/3 years. Generally more PFS and FFS than D-FFS and LR-FFS events occur, which increases the statistical power for PFS and FFS and enables small randomized trials to be performed. This study demonstrated both 2-and 3-year PFS and FFS were valid surrogate endpoints for 5-year OS. The rank correlation coefficient for both surrogates weakened with reduced follow-up duration, with the highest coefficient observed for 3-year PFS and 5-year OS. As PFS was defined as death or failure at any site, the overlap between the definitions of PFS and OS may account for the superior predictive ability of PFS compared to FFS. Thus, it may be appropriate to use 3-year PFS to predict long-term OS. Overall, the use of 2/3-year PFS and FFS as surrogate endpoints could reduce the number of patients that need to be recruited and shorten the duration of clinical trials by permitting earlier reporting of the results, thereby reducing the cost of developing effective therapeutic strategies for locoregionally advanced NPC. Furthermore, use of the validated surrogate endpoints would reduce the risk of abandoning potentially effective new treatments when analysis of OS is complicated by subsequent treatments after disease progression.
Several limitations of this study should be acknowledged. First, the limitations of applying statistical methods such as Prentice's criteria to validate surrogate endpoints need to be noted, and caution should be taken when interpreting these results 15 . Prentice's fourth criterion is very difficult to assess as it is formulated in terms of an equivalence setting; meeting this criterion is not necessarily definite evidence that the criterion holds 16 . Still, 2-and 3-year PFS and FFS met all four criteria, indicating their validity as surrogate endpoints for 5-year OS. Secondly, patients from only a single trial were included in this analysis. Surrogate endpoint analyses conducted using data from a number of trials could help to reinforce the conclusions of this study. Although data from only one trial were available, the findings of this study are strengthened by the fact they are derived from a relatively large cohort treated at a single institution with long-term follow-up. The trial was conducted using a reliable randomization method with standardized data collection, treatment and follow-up protocols, which should avoid potential sources of selection or other incidental bias. Third, though detection of distant metastasis and locoregional recurrence may be subject to variation depending how often imaging studies are obtained; assessment of PFS or FFS is not as definitive as OS. Also, though D-FFS and LR-FFS were not validated as surrogate endpoints, these endpoints remain useful as they reflect which sites of failure (local or distant) a treatment controls.
A surrogate endpoint can only be validated for the treatment assessed; extrapolation to a therapeutic regimen with a different mechanism of action may not be warranted. Here, we investigated CRT in a prospectively-identified cohort of patients with locoregionally advanced NPC. This CRT regimen has been recommended as a standard treatment option for locoregionally advanced NPC 6 , indicating this analysis is contemporary with a broad clinical applicability. However, we still need to directly verify  Table 4. Effect of treatment on 5-year overall survival in patients stratified by the surrogate endpoints a . CRT, chemoradiotherapy; RT, radiotherapy. a As treatment was not prognostic for distant failure or locoregional failure, Prentice's criteria 3 and 4 were not tested for these endpoints. b The survival rates for surrogate endpoints were estimated using the Kaplan-Meier method. c P-value based on the chi-square test using the Cox proportional hazards model. The following parameters were included in the Cox proportional hazards model by backward elimination: age (> 50 vs. ≤ 50 years), sex (male vs. female), T classification (T3-4 vs. T1-2), N classification (N1-3 vs. N0) and treatment arm (CRT vs. RT).
surrogate endpoints for trials assessing targeted or immunological therapies or trials that include selected patient subgroups. These analyses represent an initial attempt to explore possible surrogate endpoints for OS in locoregionally advanced NPC on the basis of Prentice's four criteria; 2-and 3-year PFS and FFS were identified as valid surrogate endpoints for early assessment of treatment effects. Identification of other potential biomarkers, such as Epstein-Barr virus DNA load 17 and microRNAs 18 , as surrogate endpoints may help to better predict long-term survival outcomes at an early stage before treatment failure occurs, and enable appropriate salvage treatments to be administered in a timely manner. However, surrogate endpoints do not eliminate the need for long-term follow-up, as unexpected late side-effects may not be captured by the surrogate endpoints. Clinical trials should continue to be designed with potential surrogate endpoints as part of the trial protocol, along with OS.