Main

Breast cancer is the most common cancer in women in the United Kingdom, with around 50 000 newly diagnosed cases each year (Cancer Research UK, 2014). Approximately 4% of cases are in younger women (aged 40 years at diagnosis) yet it remains the most frequent malignancy in women of this age group (Cancer Research UK, 2014).

Determining the long-term outcome and potential benefits from systemic adjuvant treatments for early-stage breast cancer has been improved through the use of a number of currently available predictive tools, including the Nottingham Prognostic Index (NPI), Adjuvant! and PREDICT. These tools have become increasingly sophisticated; incorporating a growing number of prognostic factors, enabling a move from categorising patients into broad prognostic groups to providing survival estimates at a patient level. However, many existing prognostic tools have only been validated in a small number of younger women, in which an overestimation of up to 30% in overall survival (OS) is reported in younger women (Engelhardt et al, 2014).

Materials and Methods

Existing tools

The original NPI was based on 387 patients treated in a single institution and used tumour size, grade, and lymph node status to provide a prognostic index (Haybittle et al, 1982). The tool was later validated (Todd et al, 1987; D’Eredita et al, 2001) and updated to provide survival estimates by NPI group (Blamey et al, 2007a), and at a patient level (Blamey et al, 2007b).

Adjuvant! is an online decision aid tool, which was developed in 2001, based on 34 252 women aged 36–69 years from multiple institutions in the United States, and incorporated the patient’s age, tumour size and grade, oestrogen receptor (ER) status, and number of positive nodes, to provide individual 10-year outcomes (Ravdin et al, 2001). The tool was later validated in patients from Canada (Olivotto et al, 2005) and the United Kingdom (Campbell et al, 2009). Adjuvant! makes an adjustment for women under 35 years, based on data from 415 patients; however, this applies only to patients with ER-positive tumours (Aebi et al, 2000).

The online breast cancer prognostic and treatment benefit tool, PREDICT, is used to estimate survival for individual patients based on known pathological prognostic factors, including age at diagnosis, mode of detection, tumour grade, tumour size, ER and nodal status, and type of therapy. The tool was developed in the United Kingdom, based on 5694 women aged 23–95 years (401 aged 40 years), from multiple institutions in East Anglia, and validated in two further cohorts: 5468 women aged 22–93 years from the West Midlands Cancer Intelligence Unit (Wishart et al, 2010); and the same cohort used for the validation of Adjuvant! (Wishart et al, 2011).

An updated version of PREDICT (http://www.predict.nhs.uk) now incorporates human epidermal growth factor receptor 2 (HER2) status (Wishart et al, 2012). The updated tool was validated using 1653 patients with information on HER2 from the same cohort of 3140 women from Canada used in the previous validations of PREDICT and Adjuvant!, and was shown to provide accurate OS and breast cancer-specific survival (BCSS) estimates at 10 years. The results demonstrated that the incorporation of HER2 status improved the tool, with predicted overall and breast cancer-specific deaths within 8.4% and 2.5% of those observed (P=0.05 and P=0.60, respectively). However, the results indicated that PREDICT underestimated the number of deaths in women aged 20–35 years by 32% for both OS and BCSS (Wishart et al, 2012).

Although PREDICT was successfully validated in a large cohort, only 159 (9.6%) of these women from Canada were aged 40 years at diagnosis. However, young age at diagnosis is associated with lower survival rates compared with older patients (El Saghir et al, 2006; Adami et al, 1986). The reasons for the differing effects remain unclear; studies have shown that lower survival rates coupled with higher relapse rates in younger patients are independent of other known prognostic factors (de la Rochefordiere et al, 1993; Nixon et al, 1994). Younger patients may receive less significant survival benefits from hormonal therapy after adjuvant chemotherapy than older patients (Ahn et al, 2007). Furthermore, studies of premenopausal and perimenopausal women with breast cancer have found that chemotherapy alone was insufficient for younger patients, and perhaps tailored treatments should be investigated (Aebi et al, 2000; Colleoni et al, 2006).

The aim of this study was to investigate the performance of the updated PREDICT tool in a large cohort of women aged 40 years at breast cancer diagnosis in terms of model calibration and discrimination for both OS and BCSS.

Study population

POSH is a multicentre prospective observational cohort study of 3000 young women diagnosed with breast cancer in the United Kingdom between 2000 and 2008 (http://www.southampton.ac.uk/medicine/research/posh.page). The detailed study protocol was published in 2007 (Eccles et al, 2007), and the cohort has previously been described (Copson et al, 2013a).

Information obtained in the POSH cohort included age at diagnosis, ethnicity (Caucasian/white, black, Asian, other, missing/unknown), menopausal status (pre-, peri-, and post-menopausal, or missing/unknown), family history of breast cancer, ER, progesterone receptor, and HER2 status, histology, histological grade, tumour size, number of positive lymph nodes, lymphovascular invasion status, focality (localised, multifocal, missing/unknown), presentation, gene status (BRCA1, BRCA2, other, not tested/missing/unknown), and type of adjuvant therapy. This article presents analyses conducted on follow-up data from the POSH cohort received until 22 October 2013.

Study end points were OS and BCSS at 5, 8, and 10 years, where OS is defined as time from breast cancer diagnosis to death from any cause, and BCSS as time to death from breast cancer (deaths from other causes were censored at the time of last follow-up). The cohort includes data from 2827, 1843, and 597 patients with key prognostic information needed for the analyses for OS at 5, 8, and 10 years, respectively. Cause of death was missing for 5 patients leaving 2822, 1841, and 595 patients for analyses for BCSS, respectively.

Predicted OS and BCSS were calculated for each patient using PREDICT by investigators blinded to actual patient outcomes. Age at diagnosis, tumour size, and number of positive lymph nodes were entered as continuous variables. Categorical variables were used for presentation (screen-detected, symptomatic, or unknown), histological grade (1, 2, 3, or unknown), ER status (negative (ER−) or positive (ER+)), HER2 (negative (HER2−), positive (HER2+), or unknown), and chemotherapy regimen (second generation or third generation).

Model calibration and discrimination

Using a similar approach to the methods applied in previous validations of PREDICT (Wishart et al, 2011, 2012), predicted OS and BCSS were compared with the corresponding observed OS and BCSS at 5, 8, and 10 years. Model calibration, a comparison between the predicted and observed mortality, was evaluated for the complete data set, by quartiles of the predicted mortality and also within strata (Wishart et al, 2012). As evidence suggests that existing prognostic tools do not perform well in Asian patients (Engelhardt et al, 2014), and black ethnicity was found to be an independent risk factor for reduced survival in younger women (Copson et al, 2013b), model calibration was also evaluated across ethnic groups. Goodness-of-fit tests were performed using χ2 tests based on the number of predicted and observed events (4 d.f.). Model discrimination was assessed by calculating the area under the receiver-operator characteristic curve (AUC) and corresponding 95% confidence intervals for 5-, 8-, and 10-year predicted all-cause mortality and breast cancer-specific mortality. The AUC is the probability that the predicted mortality of a randomly selected patient who died is higher than that of a randomly selected patient who survived; the higher the AUC, the better the model is at identifying patients with a worse survival (Wishart et al, 2012).

All analyses were performed using STATA v12.1 (StataCorp LP, College Station, TX, USA).

Results

Model calibration

The demographic, tumour, and treatment information at baseline together with the predicted and observed all-cause OS at 5, 8, and 10 years are shown in Table 1. Overall, PREDICT did not perform well at 5 years, with an underestimation of the total number of deaths by 25% (455 vs 607, P<0.001), and within most subgroups; most notably in patients with grade 2 tumours (58%; 67 vs 161, P<0.001), 0–10 mm tumours (52%; 20 vs 42, P=0.001), ER+ tumours (56%; 158 vs 362, P<0.001), and patients receiving both hormone and chemotherapy (55%; 147 vs 326, P<0.001). Conversely, PREDICT overestimated the number of deaths by 21% (297 vs 245, P=0.0009) in patients with ER− tumours and patients receiving adjuvant trastuzumab (data not shown) by 25% (86 vs 69, P=0.041). Across HER2 and ethnicity subgroups, PREDICT was found to underestimate all-cause mortality in HER2− tumours, Caucasian/white and Black ethnicity (33%; 255 vs 383, P<0.001, 25%; 413 vs 547, P<0.001, and 42%; 21 vs 36, P=0.012 respectively), and slightly underestimated for HER2+ and borderline tumours. In addition, when looking at chemotherapy regimen (data not shown) the tool was found to underestimate the number of deaths by 28% (297 vs 410, P<0.001) and by 18% (146 vs 177, P=0.020) for patients receiving second- and third-generation chemotherapy, respectively. Despite a poor performance across most subgroups at 5 years, the tool did perform well in certain subgroups; notably in patients with tumours >50 mm (14% underestimation; 85 vs 99, P=0.159), patients receiving both radiotherapy and breast conserving surgery (11% underestimation; 168 vs 188, P=0.145), and in Asian patients (7% underestimation; 14 vs 15, P=0.796).

Table 1 Observed and predicted 5-, 8-, and 10-year all-cause mortality by demographical, tumour ,and treatment characteristics

At 8 years, the performance of PREDICT considerably improved, with the difference between the predicted deaths within 6% of those observed across the entire cohort (430 vs 454, P=0.260), and the tool performing well across most subgroups. Notable improvements to the predicted all-cause mortality from 5 to 8 years were found in patients aged 36–40 years (23%, P<0.001, to 1%, P=0.845), patients with a ductal morphology (23%, P<0.001, to 2%, P=0.688), negative nodal status (21%, P=0.009, to 4%, P=0.704), 21–50 mm tumours (23%, P<0.001, to 4%, P=0.551), patients who had radiotherapy and a mastectomy (27%, P<0.001, to 8%, P=0.236), and patients receiving second- or third-generation chemotherapy (28%, P<0.001, to 7%, P=0.201 and 18%, P=0.020, to 4%, P=0.705, respectively). The performance of the tool had also greatly improved across all ethnicity subgroups and in patients with HER2+ and borderline tumours. The number of predicted deaths remained much lower than those observed for patients with grade 2 tumours, 0–10 mm tumours, patients with a lobular morphology, and patients receiving both hormone and chemotherapy. There was also greater disparity in the predicted vs observed number of deaths at 8 years compared with 5 years when looking at patients receiving adjuvant trastuzumab and at ER subgroups, with all-cause mortality overestimated for patients receiving adjuvant trastuzumab by 58% (52 vs 33, P=0.001), and overestimated for ER− tumours by 32% (242 vs 183, P<0.001) and underestimated for ER+ tumours by 31% (188 vs 271, P<0.001).

In contrast to the underestimation of the total number of deaths at 5 and 8 years, PREDICT was found to overestimate the number of deaths at 10 years by 8% (164 vs 152, P=0.330). The number of deaths remained overestimated for patients with ER− tumours (47%; 94 vs 64, P<0.001), grade 3 tumours (25%; 129 vs 103, P=0.010), and tumours larger than 50 mm (115%; 28 vs 13, P<0.001). In addition, the tool was found to overestimate all-cause mortality in patients aged 36–40 years (18%; 90 vs 76, P=0.108), patients with grade 3 tumours (25%; 129 vs 103, P=0.010) and tumours without lymphovascular invasion (26%; 59 vs 47, P=0.080), patients receiving chemotherapy alone (32%; 94 vs 71, P=0.006), and patients receiving third-generation chemotherapy (31%; 42 vs 32, P=0.077). Nevertheless, the 10-year estimates predicted by the tool had much improved across most subgroups, including ethnicity subgroups and HER2 subgroups.

The number of predicted vs observed breast cancer-specific deaths was very similar to the results for OS, with no notable differences between the OS and BCSS results when comparing these both by year (5, 8, and 10 years) and across subgroups (Supplementary Table 1).

The comparison between the predicted and observed all-cause mortality by quartiles of the predicted risk both across years and split by ER status is presented in Figure 1. PREDICT was found to overestimate the total number of deaths for patients with ER− tumours but underestimate for the ER+ subgroup. Goodness-of-fit test P-values identified that PREDICT performed better at 10 years compared with 5 and 8 years (P<0.001, P=0.00284, and P=0.0295 for ER− tumours at 5, 8, and 10 years, respectively, and P<0.001, P<0.001 and P=0.0183 for ER+ tumours).

Figure 1
figure 1

Calibrations plots of observed OS outcomes with 95% confidence intervals against predicted outcomes by quartiles of the predicted risk, by ER status. OS outcomes for patients with ER− tumours at (A) 5 years, (B) 8 years, and (C) 10 years, and OS outcomes for patients with ER+ tumours at (D) 5 years, (E) 8 years, and (F) 10 years.

The findings were almost identical when comparing the predicted and observed breast cancer-specific mortality by year and ER status (Supplementary Figure 1). Moreover, when comparing the predicted and observed all-cause mortality by ER status for patients with HER2− tumours only (Figure 2) and HER2+ tumours only (Figure 3), the pattern of overestimating mortality in patients with ER− tumours and underestimating mortality in ER+ tumours remained evident across years. The overestimation of the tool was also apparent when comparing the number of predicted vs observed all-cause deaths in patients with both ER− and HER2+ tumours (data not shown), with an overestimation at 5, 8, and 10 years of 38% (101 vs 73, P=0.001), 39% (78 vs 56, P=0.003), and 68% (32 vs 19, P=0.003), respectively. For patients with both ER− and HER2− tumours, the overestimation was not as large (8% (P=0.301), 18% (P=0.060), and 22% (P=0.188), respectively). For patients with both ER+ and HER2− tumours, large differences between the predicted and observed deaths could be seen at 5 and 8 years but not at 10 years. Underestimation was prominent in patients with both ER+ and HER2+ tumours, with an underestimation at 5, 8, and 10 years of 46% (59 vs 101, P<0.001), 26% (63 vs 85, P=0.017), and 32% (21 vs 31, P=0.072), respectively.

Figure 2
figure 2

Calibrations plots of observed OS outcomes with 95% confidence intervals against predicted outcomes by quartiles of the predicted risk for HER2− patients only, by ER status. OS outcomes for patients with ER− tumours at (A) 5 years, (B) 8 years, and (C) 10 years, and OS outcomes for patients with ER+ tumours at (D) 5 years, (E) 8 years, and (F) 10 years.

Figure 3
figure 3

Calibrations plots of observed OS outcomes with 95% confidence intervals against predicted outcomes by quartiles of the predicted risk for HER2+ patients only, by ER status. OS outcomes for patients with ER− tumours at (A) 5 years, (B) 8 years, and (C) 10 years, and OS outcomes for patients with ER+ tumours at (D) 5 years, (E) 8 years, and (F) 10 years.

Model discrimination

PREDICT provided a reasonably high degree of discrimination for OS across years (data not shown) and when splitting by ER and HER2 status (Figure 4). Model discrimination was slightly better for patients with ER+ tumours across all years (AUC: 0.718 vs 0.730, 0.709 vs 0.748, 0.694 vs 0.724 for ER− vs ER+ tumours at 5, 8, and 10 years, respectively). Conversely, model discrimination was better for patients with HER2− tumours, particularly at 10 years (AUC: 0.724 vs 0.592 for HER2− vs HER2+ tumours). Similar findings were apparent for BCSS (Supplementary Figure 2).

Figure 4
figure 4

ROC curves for OS: split by ER status at (A) 5 years, (B) 8 years, and (C) 10 years; and split by HER2 status at (D) 5 years, (E) 8 years, and (F) 10 years.

Discussion

This study has demonstrated that the prognostic tool PREDICT is able to provide accurate long-term (8- and 10-year) outcomes for younger women with breast cancer but provides only limited accuracy regarding short-term (5-year) survival, for both OS and BCSS.

High performance in terms of accurate predictions in the number of deaths was prominent in a number of subgroups across both years and survival type. These subgroups included patients receiving both radiotherapy and breast conserving surgery, patients with grade 1 tumours, HER2+ tumours, and patients with both ER− and HER2− tumours. Although the number of Asian patients included in the analyses at 5, 8, and 10 years was small (84, 55, and 16, respectively), PREDICT performed well in this subgroup of the POSH cohort, which was contrary to previous findings on a number of prognostic tools (Engelhardt et al, 2014). Our findings also demonstrated that in Caucasian/white and Black ethnicity subgroups, PREDICT was able to provide accurate long-term, but not short-term, estimates.

A key area in which PREDICT could improve its prognostic ability is in patients with ER− tumours. The number of predicted deaths across both years and survival type was overly pessimistic in this subgroup as demonstrated by the calibration plots (Figure 1 and Supplementary Figure 1). These results were contrary to the previously published validation of PREDICT for patients aged 20–85 years, in which the tool was found to accurately predict 10-year OS and BCSS in the ER− subgroup (Wishart et al, 2012).

Splitting the ER− subgroup by HER2 status, it was evident that there is some disparity between the PREDICT estimates for HER2− and HER2+ patients in this subgroup, with the tool providing accurate predictions for patients with both ER− and HER2− tumours across years and survival type, while consistently overestimating the number of deaths for patients with both ER− and HER2+ tumours. In contrast, the tool underestimated the number of deaths in patients with ER+ tumours, when looking not only at the subgroup as a whole but also when further splitting by HER2 status.

In relation to HER2 subgroups overall, the tool was able to predict the number of deaths for both OS and BCSS to within 1% of those observed for the HER2− subgroup, with corresponding 10-year AUCs of 0.724 and 0.718, respectively. PREDICT was not quite as reliable in terms of identifying patients with a worse survival in the HER2+ subgroup (AUC=0.592 for both 10-year OS and BCSS), although the predicted number of deaths for OS and BCSS were within 6% of those observed. Fewer patients with HER2+ tumours were available in the 10-year evaluation (n=140 for both OS and BCSS) compared with HER2− tumours (n=327 and n=325, respectively), which might have contributed to the reduction in accuracy of PREDICT for patients with HER2+ tumours.

In terms of OS and BCSS overall, the findings of our study demonstrated that PREDICT performs equally well, whereas the Wishart et al validation showed that PREDICT is slightly better at providing BCSS estimates compared with OS estimates (Wishart et al, 2012). The validation also found that PREDICT underestimated the number of deaths at 10 years by 32% for patients aged 20–35 years (Wishart et al, 2012). In contrast to this, our study of outcomes for patients aged 18–40 years at diagnosis showed an overestimation in 10-year all-cause mortality of 8%. We explored this a little further by dichotomising age at diagnosis into two groups (data not shown), <35 and 35–40 years. Surprisingly, for the group aged <35 years, the number of deaths predicted (OS) was within 2% of those observed (60 vs 61, P=0.898) and within 14% (104 vs 91, P=0.173) for patients aged 35–40 years, indicating that differences in response to treatment between these two age groups could lead to some disparity between the performance of PREDICT between these two age groups.

There could be several reasons for the differences found in this study compared with other studies. The similarity of the OS and BCSS results of our study is likely due to the fact that competing mortality does not play as important a role within this age group; excluding five patients with a missing/unknown cause of death, over 96% of deaths were due to breast cancer. In addition, PREDICT was developed to provide long-term (10-year) outcomes, not short-term, which might explain the reduced accuracy of the 5-year estimates. A possible reason for poor performance in patients aged 20–35 years is the low numbers on which the model was based; only 401 women aged 40 years were used in the development of PREDICT (131 patients had ER− and 270 ER+ tumours). Furthermore, the number of patients aged 40 years evaluated in the validation of the enhanced PREDICT tool was relatively small (n=159; Wishart et al, 2012). It should be noted that poorer prediction in younger patients is a common finding across a number of other prognostic tools, which were found to overestimate OS by up to 30% in younger women (Engelhardt et al, 2014). As PREDICT did not perform well in a number of subgroups in our study, including the ER− subgroup, further modifications to PREDICT using this larger data set could improve survival estimates for younger breast cancer patients. Furthermore, inclusion of the proliferation marker KI67 in the PREDICT model has led to a statistically significant improvement in function of the PREDICT model for ER+ patients (Wishart et al, 2014), which may also improve prognostication for younger patients.

There are some limitations to our study, which should be taken into account when interpreting the results. Less than half the patients from the POSH cohort had reached 10 years from diagnosis at the time of this analysis and so only a relatively small number could be included in the 10-year comparison (n=607). This is, however, still considerably larger than the number of women aged 40 years in the validation of PREDICT (n=159; Wishart et al, 2012). Our study also demonstrated an overestimation in OS at 10 years so arguably this longer follow-up may not improve the estimates. Our data confirm a need for caution in extrapolating data from older cohorts to inform management in young patients with breast cancer. It also confirms the need to investigate treatment approaches in trials involving sufficiently large numbers of younger women, which would allow independent analysis to determine whether there are major outcome differences and to understand why. Trials of more treatment approaches specifically directed to younger patients with breast cancer should be investigated (Aebi et al, 2000; Colleoni et al, 2006; Narod, 2012). An additional limitation is that adjuvant Herceptin has been used in the United Kingdom routinely since 2005 so patients diagnosed with HER2+ disease before this date will only have received Herceptin in the metastatic setting. It is therefore possible that the outcome of HER2 patients in the POSH cohort as a whole is inferior to HER2 patients diagnosed and treated in the United Kingdom since 2005. However, the impact of this is likely to be minimal as the number of patients with HER2+ tumours diagnosed before 2005 (n=298) and since 2005 (n=381) included in this analysis is relatively small.

In conclusion, this study has demonstrated that PREDICT, a web-based tool that is easy to navigate for both patients and users, is a valuable resource in providing accurate and reliable long-term outcomes for younger patients. Although caution should currently be used when interpreting the short-term survival estimates in younger patients and the long-term estimates of younger patients with ER− tumours, it is intended that future modifications of PREDICT will include the incorporation of the POSH data set to allow for more robust estimates for younger women with breast cancer. Accurate prediction of outcome at both short- and long-term time points may be particularly important to women trying to determine the optimal timing of risk-reducing mastectomies.