Main

Prediction of residual risk following adjuvant endocrine treatment in early breast cancer has become a critical component in the selection of treatment options. In addition, recent data on the impact of up to 10 years of adjuvant endocrine therapy in early breast cancer suggest that some women may benefit from extended treatment with tamoxifen(Goss et al, 2003; Davies et al, 2013; Gray et al, 2013). At present, patients and their clinicians may be presented with three time-related choices related to the treatment of luminal breast cancers: (1) whether it is beneficial to treat patients with adjuvant chemotherapy and therefore delay endocrine therapy; (2) whether to follow a switch strategy for endocrine therapy; and (3) whether to extend endocrine therapy after 5 years for a total of 10 years. Multiple diagnostic algorithms have been developed to provide information on residual risk for patients facing these choices; however, there is limited information on how time impacts the risk assessments provided by these diagnostic tools.

A recent study developed a residual risk model (IHC4; Cuzick et al, 2011) combining immunohistochemical (IHC) assessment of ER, PgR, Ki67 and HER2, which provided equivalent information on residual risk to the multiparameter, PCR-based, OncotypeDx test. The IHC4+C (IHC markers plus clinicopathologic parameters) score also provides additional information on the residual risk of distant recurrence to ER-positive primary breast cancer patients receiving adjuvant endocrine therapy, supplementary to that provided by the Adjuvant online! and NPI intermediate-risk groups (Barton et al, 2012). Another IHC model (Mammostrat; Ring et al, 2006) has been developed combining five IHC biomarkers (p53, NDRG1,CEACAM5, SLC7A5 and HTF9C), which has been shown to significantly improve traditional prognostic factors in predicting outcome for ER-positive breast cancer patients (Ross et al, 2008; Bartlett et al, 2010, 2012).

Non-proportional effects of prognostic scores have been previously reported with better prediction for patients at high risk of early relapse rather than for those at risk of later recurrent disease (Buyse et al, 2006; Desmedt et al, 2007; Haibe-Kains et al, 2008). Haibe-Kains et al (2008) suggested that this could be due to three reasons: (i) different biological mechanisms for the occurrence of early and late relapses; (ii) the statistical methodology, with the scores being developed on cohorts based on median follow-up; (iii) the quality of survival data, where one could intuitively believe that the quality of survival data decreases with respect to the duration of follow-up as it is difficult to follow up patients during a long period resulting in a high level of censoring.

In this study we investigated the impact of follow-up duration on the IHC4 and Mammostrat scores to determine whether these two prognostic panels provide information on the risk of early or late recurrence.

Materials and Methods

Materials

The Edinburgh Breast Conservation Series (BCS) represents a fully documented, consecutive cohort of 1812 patients treated by breast conservation surgery, axillary node sampling or clearance, and whole breast radiotherapy at the Edinburgh Cancer Centre between 1981 and 1998 (Thomas et al, 2009; Bartlett et al, 2010). Following ethical approval (Lothian Local Research Ethics 04) tissue blocks were retrieved from all cases and sufficient material was available from 1686 cases for assembly into tissue microarrays (TMAs) (Supplementary Figure 1A). For all cases with available tissue, tumours were regraded on whole sections by a single pathologist (Thomas et al, 2009).

The Tamoxifen Exemestane Adjuvant Multinational (TEAM) trial is a multinational randomised, open-label, phase III trial in postmenopausal women with hormone receptor-positive early breast cancer testing the efficacy of 5 years of exemestane (25 mg once per day) vs tamoxifen (20 mg once per day for 2.5–3 years) followed by exemestane for a total of 5 years (van de Velde et al, 2011). Five of nine participating countries provided paraffin-embedded tumour samples for pathology sub-studies (Bartlett et al, 2011a). Tissue blocks were received at a central laboratory and 4598 were found suitable for TMA construction (Supplementary Figure 1B).

Biomarker analysis

Immunohistochemical staining for a panel of biomarkers including ER, PgR, HER2, Ki67, HTF9C, CEACAM5, NDRG1, p53 and SLC7A5 and FISH (fluorescence in situ hybridisation) for HER2 was performed using either sextuplet (ER and PgR) or triplicate (all other markers) 0.6 mm2 TMA cores. Results were derived from dual scoring by expert observers (as described by Kirkegaard et al (2006)) for the Edinburgh BCS cohort for all markers. For TEAM patients, ER, PgR and Ki67 scores were derived by quantitative image analysis using the Ariol system with algorithms validated against both whole sections and manual assessment (Faratian et al, 2009; Bartlett et al, 2011a). Data for ER were recorded as a histoscore (Kirkegaard et al, 2006) and for Ki67 and PgR as a percentage of positive cells (ATAC and Ki67 guidelines; Dowsett et al, 2011). Results for HER2 were scored according to the UK guidelines (Walker et al, 2008; Bartlett et al, 2011b), with cases regarded as HER2-amplified if any core showed amplification/overexpression. Positivity for p53, HTF9C (recently re-named TRIMT2A), CEACAM5, NDRG1 and SLC7A5 was recorded as previously described (Ring et al, 2006; Ross et al, 2008; Bartlett et al, 2010, 2012).

Generation of prognostic scores

The IHC4 model (Cuzick et al, 2011) utilised a linear combination of multiple markers: ER, PgR, HER2 and Ki67. Continuous marker scores were normalised prior to inclusion in the IHC4 model. ER histoscores were divided by 30, and PgR scores as a percentage of cells staining positive were divided by 10 to obtain continuous values between 0 and 10. Ki67 scores were represented as percentage positive cells and HER2 was treated as a dichotomous variable. The IHC4 risk score was generated according to the previously specified algorithm (Cuzick et al, 2011). The IHC4 score is analysed as a continuous risk score, except for Kaplan–Meier analyses, in which the IHC4 score is categorised into three groups using two cutoff points that correspond to a 10-year distant recurrence rate of 10% and 20% from the original study; however, these cutoffs have not been previously validated (Cuzick et al, 2011).

The Mammostrat model (Ring et al, 2006) used five IHC markers: SLC7A5, CEACAM5, NDRG1, HTF9C and p53. The Mammostrat risk score was generated by combining binary staining results for all markers as either positive or negative according to the previously specified algorithm (Ring et al, 2006; Ross et al, 2008; Bartlett et al, 2010, 2012). The Mammostrat score was categorised into low (0), medium (>0 and <0.7) and high (0.7) risk categories as previously specified (Ring et al, 2006; Ross et al, 2008; Bartlett et al, 2010, 2012).

Missing data

The known technical limitations of TMAs inevitably result in missing data (Voduc et al, 2008). A large amount of data were missing (32.1%) for the PgR variable measured as a percentage of positive cells in the Edinburgh BCS cohort. Therefore, multiple imputation was performed using all predictors plus the event indicator and the Nelson–Aalen estimator of the cumulative baseline hazard, as recommended by White and Royston (2009). We used the mi impute chained command in Stata to perform multiple imputation using chained equations to generate 42 imputed data sets, based on the rule of thumb suggested by White et al (2011). The results from analyses on each of the imputed data sets were combined using Rubin’s rules to produce estimates and confidence intervals that incorporate the uncertainty of imputed values (Rubin, 1987).

Statistical analysis

The primary end point selected for this study was time to distant recurrence (TTDR) as this is the event that drives subsequent death from breast cancer. Time to distant recurrence was defined as the time to distant metastasis (van de Velde et al, 2011) or death with evidence of recurrent breast cancer, with patients censored at the time of last follow-up. Additional clinical variables used, when specified, were age (continuous), tumour size (continuous), number of positive nodes (continuous), histological grade (grade I–III), treatment (exemestane/tamoxifen in TEAM) and chemotherapy (yes/no). Investigation into the functional form of continuous covariates on the log-hazard identified tumour size and the number of positive nodes to be non-linear and were included as such in all analyses.

The primary analysis was based on ER-positive patients treated with endocrine therapy (without chemotherapy). Secondary analyses were performed on all ER-positive patients irrespective of treatment. Exploratory analyses were performed on node-negative and node-positive ER-positive patients treated with endocrine therapy (without chemotherapy) and on node-positive ER-positive patients irrespective of treatment.

The assumption of proportional hazards when performing Cox regression was tested by plotting scaled Schoenfeld residuals against time and testing for a zero slope. Cox modelling was performed including a covariate–log (time) interaction into the model. The multivariable fractional polynomial time (MFPT) algorithm proposed by Sauerbrei et al (2007) was used to determine which variables had possible non-proportional effects, as well as the best-fitting fractional polynomial to model these effects. We explored the effect of these derived risk scores before and after 5 years (as a decision point for continued endocrine therapy at this time) to determine whether either score provided information on the residual risk of early or late distant recurrence.

The performance of the IHC4 and Mammostrat risk scores was assessed along with conventional clinical risk factors using measures of calibration and discrimination, with follow-up censored at 5, 9 and 15 years for the Edinburgh BCS cohort and 3 and 5 years for the TEAM cohort. ‘Model calibration’ refers to how closely the estimates of survival from the model agree with the survival from the observed data (Altman et al, 2009; Moons et al, 2009). This was assessed for each decile of predicted risk, ensuring 10 equally sized groups, by producing a calibration plot (observed vs predicted probabilities of 5-year distant recurrence) and calculating the calibration slope. ‘Discrimination’ is the ability of a risk score to differentiate between patients who do and those who do not experience an event during the study period (McGeechan et al, 2008; Altman et al, 2009). Discrimination was evaluated using Royston and Sauerbrei’s (2004) R2 statistic based on their index of discrimination (D), with a difference in D of at least 0.1 indicating improved prognostic separation. All statistical analyses were carried out in Stata (version 12).

Results

Data were available on 1449 (ER-negative and ER-positive) patients with 273 distant recurrences from the Edinburgh BCS cohort (median follow-up of 12.9 years) and on 3766 (ER-positive) patients with 548 distant recurrences from the TEAM cohort (median follow-up of 6.2 years). The distributions of the scores were different between the two cohorts (Supplementary Figure 2), with the median IHC4 scores being 44 and 27 in the Edinburgh BCS and TEAM cohorts, respectively. A larger proportion of patients (64%) were allocated to the low Mammostrat risk group in the Edinburgh BCS cohort compared with 43% in the TEAM cohort. This is as expected because of the TEAM cohort being a higher-risk population compared with the Edinburgh BCS cohort, with higher mean tumour size (23 mm vs 16 mm, respectively), a larger proportion of higher grade tumours (grade 3: 35% vs 19%, respectively) and higher mean number of positive nodes (1.9 vs 0.5, respectively).

Non-proportional effects in a multivariable model

In multivariable modelling, the MFPT algorithm determined IHC4 to have a significant time-by-covariate interaction for both patient subgroups in both cohorts with the best-fitting FP to be log of time. The parameter associated with this interaction was negative, suggesting the effect of a unit increase in IHC4 on TTDR decreased over time (Figure 1). The decrease over time was more prominent in the Edinburgh BCS cohort, with the adjusted HR crossing the value 1 (corresponding to a null effect) at approximately 6.5 years. The Mammostrat score was determined to have a significant time-by-covariate interaction in the Edinburgh BCS cohort only for high risk vs low risk (Figure 2). There was uncertainty in the best-fitting FP to model the interaction, with log of time chosen for all ER-positive patients and time-cubed chosen for ER-positive patients treated with endocrine therapy only.

Figure 1
figure 1

Time-dependent adjusted hazard ratio estimate (up to 10 years) with 95% CIs (dashed lines) for a unit increase in IHC4 score for all ER-positive patients in ( A ) the Edinburgh BCS cohort and ( B ) the TEAM cohort. Adjusted for age, grade, nodes positive, treatment and chemotherapy.

Figure 2
figure 2

Time-dependent adjusted hazard ratio estimate with 95% CIs (dashed lines) for high-risk compared with low-risk Mammostrat score in the Edinburgh BCS cohort for ( A ) all ER-positive patients and ( B ) ER-positive patients treated with endocrine therapy only. Adjusted for age, grade, nodes positive, treatment and chemotherapy.

Impact of follow-up duration on model performance

We assessed the performance of the scores in addition to clinical factors at various lengths of follow-up. Measures of discrimination are given in Table 1 for full follow-up (using all available data rather than censoring at a specific time point) and follow-up censored at 5 years. In the Edinburgh BCS cohort the models performed statistically better with shorter follow-up compared with full follow-up with differences in D-statistic between 0.4 and 0.5 and R2 between 7 and 13%. There was a small improvement in model performance with shorter follow-up in the TEAM cohort, with increases in R2 between 1.5 and 3% and differences in D-statistic between 0.05 and 1. The calibration of the combined model was improved in the Edinburgh BCS cohort with follow-up censored at 5 years, with a calibration slope estimate of 1.0 (95% CI 0.8–1.1) for follow-up censored at 5 years vs 1.2 (0.8–1.5) for full follow-up for all ER-positive patients.

Table 1 Performance data on IHC4 and Mammostrat score in addition to clinical factors

Prognostic value of scores within the first 5 years and beyond 5 years after diagnosis

To investigate whether either score provided prognostic information on TTDR beyond 5 years after diagnosis, Cox regression was performed with follow-up time divided into the intervals 0–5 years and 5–10 years. Period-specific Kaplan–Meier curves are displayed in Supplementary Figure 3. Both scores were significant independent predictors of outcome restricted to the first 5 years of follow-up, after which there was no evidence that the scores were associated with TTDR (Table 2). For example, the interquartile HR for IHC4 score was 2.1 (95% CI, 1.1–4.1) in the first 5 years after diagnosis compared with 1.0 (95% CI, 0.5–2.0) after 5 years for all ER-positive patients in the Edinburgh BCS cohort. There was evidence of a prognostic effect after 5 years for Mammostrat high risk vs low risk for all ER-positive patients (Table 2) and ER-positive node-negative patients treated with endocrine therapy (Supplementary Table 1) in the TEAM cohort with HRs of 1.6 (95% CI, 1.0–2.4) and 3.3 (95% CI, 1.1–10.5), respectively. This effect of Mammostrat high vs low risk was also seen in the Edinburgh BCS cohort for ER-positive node-negative patients treated with endocrine therapy only (Supplementary Table 1) for the 5–10-year time period after diagnosis with an HR of 2.8 (95% CI, 1.0–7.8).

Table 2 Period-specific multivariate Cox regression of IHC4 and Mammostrat score

Comparison of IHC4 and Mammostrat

The IHC4 score provided additional prognostic information beyond that of clinical factors compared with the Mammostrat score for all ER-positive patients in both patient cohorts in the first 5 years of follow-up (Supplementary Table 2; increase in R2: 6.6% vs 2.5% and D-statistic: 0.12 vs 0.07 in the Edinburgh BCS cohort and increase in R2: 5.0% vs 2.5% and D-statistic: 0.16 vs 0.08 in the TEAM cohort). Similarly, for ER-positive patients treated with endocrine therapy in the TEAM cohort, the IHC4 score was the stronger predictor of outcome, whereas in the Edinburgh cohort the prognostic information provided by either score was similar (increase in R2: 3.7% vs 3.2% and D-statistic: 0.12 vs 0.11).

The addition of both scores to clinical factors

The scores were entered simultaneously into a multivariate Cox regression model, and in the first 5 years of follow-up the addition of both scores to clinical factors provided statistically significant information (P<0.05) for both subsets of patients across both cohorts with increases in R2 between 5 and 6% and increases in D-statistic between 0.16 and 0.21 (Supplementary Table 2). However, both scores only remained significant independent predictors of TTDR restricted to the first 5 years of follow-up when simultaneously entered into a multivariate Cox regression model for all ER-positive patients in the TEAM cohort with an interquartile HR for ICH4 score of 1.6 (95% CI, 1.1–2.4) and HRs for medium and high vs low Mammostrat score of 1.3 (95% CI, 1.0–1.8) and 1.7 (95% CI, 1.3–2.2), respectively (Supplementary Tables 3). Only the IHC4 score provided significant independent prognostic information on TTDR in the first 5 years of follow-up for all ER-positive patients treated with endocrine therapy in the TEAM cohort and for both patient subgroups in the Edinburgh BCS cohort. Although not statistically significant, the Mammostrat score provided some improvement in model discrimination over and above that provided by the IHC4 score and clinical factors with an increase in R2 and D-statistic of 2.3% and 0.09, respectively, for ER-positive patients treated with endocrine therapy only in the Edinburgh BCS cohort. There was evidence of an effect of Mammostrat high risk vs low risk after 5 years of survival after adjustment for IHC4 and clinical factors in ER-positive, node-negative patients treated with endocrine therapy only in the TEAM and Edinburgh BCS cohorts with HRs of 3.2 (95% CI, 1.0–10.1) and 2.8 (95% CI, 1.0–7.7), respectively (Supplementary Table 4).

Discussion

Our analyses confirm that the IHC4 and Mammostrat scores are strong prognostic predictors for TTDR, but this is restricted to the first 5 years after diagnosis. The prognostic effect of IHC4 and Mammostrat score on TTDR decreased with increasing follow-up time. A previous analysis by Sgroi et al (2013) also confirmed a significant prognostic ability for IHC4 for early distant recurrence only (0–5 years).

The performance of both scores was good, especially in the first 5 years of follow-up, with the combination of both scores significantly improving the ability to discriminate between events and non-events when compared with clinical factors only and good calibration between observed and predicted 5-year risk of TTDR. The IHC4 score provided more prognostic information on TTDR compared with the Mammostrat score in the first 5 years of follow-up except for all ER-positive patients in the larger TEAM cohort where the addition of both scores provided statistically significant information.

Despite the effects of the scores being strongest in the first 5 years after diagnosis, there was evidence of a prognostic effect of the Mammostrat score with respect to the risk of late recurrence (beyond 5 years after diagnosis). ER-positive, node-negative patients treated with endocrine therapy only who were classified as having a high-risk Mammostrat score had 2.8 (95% CI, 1.0–7.8) and 3.3 (95% CI, 1.1–10.5) times the risk for distant recurrence after 5 years compared with those classified as havng low risk in the Edinburgh BCS and TEAM cohorts, respectively. These results suggest the possible use of Mammostrat score to predict the risk of late recurrence, which will need to be investigated further on other patient cohorts with long-term follow-up.

Missing data were present, although the majority of patients had information on all risk factors. We used current recommended approaches with multiple imputation to overcome the biases that occur when performing complete-case analysis (Burton and Altman, 2004; Vergouwe et al, 2010).

The IHC4 is analysed as a continuous score, but for Kaplan–Meier analysis cutoff points are required. To avoid the biases that occur from choosing our own cutoff points, we used those from the original study (Cuzick et al, 2011). However, these did not validate well in our cohorts, allocating only a small number of patients to the low-risk group (<10%).

In conclusion, the IHC4 and Mammostrat risk scores were significantly associated with risk for distant disease recurrence in the first 5 years after diagnosis and added prognostic information beyond that provided by clinical factors. Mammostrat may provide insights to patients and clinicians seeking to make informed decision about extended endocrine therapy after an initial 5 years of treatment and warrants further study.