Main

In breast cancer management, the decision to give adjuvant treatments depends on several factors such as survival benefits and complication risk. Software tools have been developed to aid this decision-making process. They estimate the probability of cancer-related mortality and relapse and calculate the percentage benefit of adjuvant treatments.

The web-based software program Adjuvant online! (www.adjuvantonline.com) (Ravdin, 1996) was developed from the American National Cancer Institute SEER registry (Surveillance, Epidemiology, End-Results data and estimates). The data represented 10% of all breast cancer patients in the United States aged between 36–59 years and treated between 1988 and 1992. It calculates the risk of relapse and 10-year cancer-related mortality with and without adjuvant therapy post cancer surgery. The outcomes are calculated using patient age, tumour size, tumour grade, oestrogen receptor (ER) status and node involvement. This model has been externally validated in different populations (Olivotto et al, 2005; Campbell et al, 2009; Mook et al, 2009; Schmidt et al, 2009; Paridaens et al, 2010; Yao-Lung et al, 2012) and is widely used by practitioners (Oakman et al, 2010). There are, however, limitations as accuracy is diminished in high-risk groups where prognosis may be overestimated (young age, high-grade and HER2-positive patients) (Hajage et al, 2011; Bhoo-Pathy et al, 2012).

In 2009, the SNAP (size, nodes, and prognostic factors) method (Chen et al, 2009) (www.CancerMath.net) was developed from 1352 breast cancer patients treated at the University of Southern California/Van Nuys Breast Center between 1966 and 2006. It predicts the risk of death for the first 15 years after diagnosis and the impact of various adjuvant therapies. Survival rates are calculated using tumour size, lymph node status, ER/progesteron receptor/HER2 status, tumour grade and histological type. To date, one unit has published its validation of SNAP(Michaelson et al, 2011). They found that SNAP provided accurate estimates of risk of death when compared against large patient data sets.

A third web-based prediction tool, PREDICT, was described in 2010 (www.predict.nhs.uk), which estimates 5- and 10-year breast cancer survival and the benefits of hormonal therapy, chemotherapy and trastuzumab (Wishart et al, 2010). The model was derived from the cancer registry information on 5694 breast cancer patients treated in East Anglia from 1999 to 2003 and used factors such as patient age, tumour size, tumour grade, number of positive nodes, ER status, Ki67 status and mode of detection. An external data set of 5468 patients from the West Midlands Cancer Intelligence Unit was used for validation (Wishart et al, 2010).

A recent review on six prediction tools (including Adjuvant!, CancerMath.net and PREDICT) and 14 biomolecular features of breast cancer found that most models were inaccurate in particular patient subgroups (Engelhardt et al, 2014). To date, there is limited data on the concordance of several prediction models. In addition, further validation of multiple models with several multinational large patient data sets is needed.

The aim of this study was to validate and to determine the concordance of three web-based prediction tools Adjuvant!, PREDICT and CancerMath.net against a patient data set consisting of ER-positive and HER2-negative breast cancer.

Patients and Methods

Patients

METABRIC data set. We retrieved 965 ER-positive and HER2-negative cases from the METABRIC study (Molecular Taxonomy of Breast Cancer International Consortium) (Curtis et al, 2012). This data set contains over 2000 clinically annotated primary fresh frozen breast cancer specimens and a subset of normals from tumour banks in the United States and Canada. Details about the METABRIC cohort have been published by Curtis et al in 2012. This public data set accessible from the web (METABRIC Data for Use in Independent Research—syn1688369, 2014).

To automatically apply the models to the data set, we used the HTML code from the web pages of the calculators (Predict and Cancermath.net models) and we generated a partition tree for Adjuvant! (the information that the Adjuvant! algorithm uses to estimates the risk are binned into categories and the output is fixed in the number of possibilities).

Some parameters could be not available in the data set, as the chemotherapy generation (required for the PREDICT and the Adjuvant! models). We considered that all the patients did receive a second generation, as the data were older than 10 years.

Statistical analysis

Epidemiological and clinical continuous variables were compared using Student’s t-test (parametric) and the Mann–Whitney test (non-parametric). Categorical variables were compared using the χ2-test or Fisher’s exact test where appropriate, as appropriate. P-values <0.05 was deemed statistically significant. Continuous variables data are presented as mean ±s.d. or median and IQR or min/max value.

As the three calculators provide 10-year overall survival estimates, we decided to compare the models using this end point. The performances for the three web calculators were quantified using calibration and discrimination. Calibration quantifies the concordance between observed frequencies and predicted probabilities. Calibration curves graphically depict the relationship between observed outcome frequencies and predicted probabilities by the models. Well-calibrated models have an intercept a=0 and a slope b=1 (ideal predictions are plotted on a line that crosses the origin). Therefore, a sensible measure of calibration (the unreliability index) is a likelihood ratio statistic testing the null hypothesis that a=0 and b=1. Of note, the unreliability [U]-statistic is satisfactory when the P-value is not significant.

Discrimination (i.e., whether the relative ranking of individual predictions was in the correct order) was quantified in both populations with the C-Index (corresponding to a measure of the area under curve for censored data) (Pencina and D’Agostino, 2004).

The Brier score was also calculated, corresponding for a given model, to the expected squared difference between patient status and predicted probability. The lower the Brier score of a model is, the better the predictive performance is.

To assess the models’ concordance, we first compared the three predicted survival values using paired t-test analysis, ANOVA measure and correlation coefficient. We then calculated the maximal dispersion corresponding to the difference between the highest and the lowest prediction for the same patient. We also calculated differences between the mean of the three predictions and each predictor. Finally, we defined three risk groups: low-, intermediate- and high-risk groups. These groups were constituted using the average risk predicted by the three models, then cut into tercile.

The R 2.15.2 software (R Development Core Team (2012), http://cran.r-project.org/) was used for data analysis, using the Hmisc, rms, and pROC libraries.

Results

We included 965 patients from the METABRIC study. Patients’ clinical and pathological characteristics are shown in Table 1. The overall mean age at diagnosis was 63.8 years. Overall, the majority of tumours were invasive ductal carcinoma (77.6%), >T1 stage (54.7%) and Elston Ellis grade 2 (46.5%). Sixty-one per cent of patients received adjuvant hormonal therapy, and only 2.1% had a chemotherapy (4% of patients received both).

Table 1 Patients characteristic

Median follow-up was 169 months.

The percentage survival rates at 1, 5 and 10 years were 98%, 75% and 53%, respectively (Figure 1).

Figure 1
figure 1

Survival curve according to Kaplan–Meier.

Performance of the three scores in our population

The discrimination performances of the three web-based calculators to predict 10-year survival, measured by the C-Index, were 0.67 (95% CI 0.63–0.70) for Adjuvant!, 0.74 (95% CI 0.71–0.77) for Cancermath.net and 0.72 (95% CI 0.69–0.75) for PREDICT. The Brier scores were 0.22, 0.23 and 0.21, respectively, for the PREDICT, the Adjuvant! and the CancerMath.net models, showing the good adequacy of each models. Mean 10-year predictions was not different between the three models (0.62, 0.62 and 0.64, respectively, for Adjuvant!, PREDICT and CancerMath.net, P=0.07). However, in the subgroups, these predictions were significantly different (P<0.001, respectively, for the high-, the intermediate- and the low-risk group). The correlation coefficients between models were 0.5 (95% CI 0.50–0.59, P<0.001) for the correlation between the Adjuvant! and the CancerMath.net models, 0.82 (95% CI 0.79–0.83, P<0.001) for the correlation between the CancerMath.net and the PREDICT models and 0.85 (95% CI 0.83–0.87, P>0.001) for the correlation between the PREDICT and the Adjuvant! models.

The average difference between the predicted probabilities and observed survival at 10 years was 9.0% for Adjuvant! (95% CI 6.0–12.0 U-Index 0.05, P<0.001), 10.7% U-Index 0.06, P<0.001) for Cancermath.net (95% CI 0.8–18, U-Index 0.06, P<0.001) and 8.0% for PREDICT (95% CI 5.0–11.0, U-Index 0.05, P<0.001) (Figure 2).

Figure 2
figure 2

Models calibrations with confidence intervals.

Median score dispersion was 13% (range 0.2–68%) for the overall population. The median dispersion of the high-risk group (as defined by tercile) was 28% (range 3–62%), whereas the low-risk group had a 9% (0.2–19%) median dispersion, and the intermediate group, a 18% (1.1–68%). Inside the group, there was a median score dispersion of 8% (range 6–10%) for the subgroup with a predicted survival higher than 90%, and 24 (range 1–68%) to 32% (range 6–52%) for the subgroups with a predicted survival between 10 and 50% (Figure 3). Differences between the mean of the three predictions and each predictor are represented in Figure 4. If the three models were zero mean, major dispersion were observed for the intermediate- and the high-risk groups.

Figure 3
figure 3

Median dispersion of the three models’ prediction and number of events according to the risk groups.

Figure 4
figure 4

Repartition of the deviation to the mean for each model, according to the risk groups.

Discussion

Prognosis prediction tools have been developed for personalised breast cancer treatment, allowing more informed decisions about adjuvant therapy. Using an independent data set, this study demonstrated excellent overall performance of each model to predict 10-year survival in patients with ER-positive HER2-negative breast tumours.

However, the reliability of prediction tools has limitations. We demonstrated that three web-based prediction tools were accurate only for patients with excellent prognosis (death events rate of 4–6%). However, the differences between the three tools’ predictions reached 30% for some subgroup of intermediate prognosis. The group with poor prognosis showed high discrepancies too, with dispersion of 28%. This highlights the limitations of current prediction models, showing significant discrepancies especially in high-risk populations. This can have potential consequences for the individual patient if their survival risk for breast cancer is underestimated.

Previous validation studies of web-based prediction tools have demonstrated this problem. Version 5.0 of Adjuvant! was validated in a Canadian population of about 4000 patients. It found significant overestimation of survival in patients younger than 35 years and in case of lymphovascular invasion (2), Cancermath.net has also been validated showing greatest accuracy in patients with highest survival rates (12).

Another concern regarding prediction tools is the question whether different geographical population groups can be applied to them. Mook et al (2009) compared actual overall and breast-cancer-specific survival for 5380 European patients with early breast cancer to the American data set of Adjuvant! The authors concluded that Adjuvant! was accurate in the Netherland population and that Adjuvant! could be applied to European populations. There was, however, within the young age (<40 years) subgroup an overestimation of overall survival by 4.2% and BCSS by 4.7%. Similar overestimations of survival were found by Hajage et al (2011) in patients with poor prognostic factors (young age, high-grade and HER2-positive patients). Other studies have confirmed similar findings (Campbell et al, 2009; Paridaens et al, 2010; Bhoo-Pathy et al, 2012; Yao-Lung et al, 2012; Jung et al, 2013).

There is limited evidence in the literature comparing several prediction models with large study populations containing different nationalities with geographical variation. One study did compare PREDICT with Adjuvant! showing similar outcome estimates (14). Our study compared three web-based models with two large patient data sets representing three countries (France, United Kingdom and Canada).

Our results are consistent with previous studies showing that the greatest discrepancy in risk prediction between the three models occurs in high-risk and some intermediate-risk patients.

To our knowledge, our study is the first to compare the results of three validated web-based predictors against large and international data. We have demonstrated that predictions are consistent across different nationalities for patients with good prognosis.

Our study has limitations. First, we decided to compare the Adjuvant! model, the ‘gold standard’, with two recent models. These models are not routinely used and have only few external validation studies. However, we showed no major difference in estimates, and the models dispersion did not come from a systematic error from one model (Figure 4). Some other scores exist to predict survival in breast cancer (Schmidt et al, 2009; Campbell et al, 2010; Jankowitz et al, 2011; Barton et al, 2012). Our goal was to highlight difficulty to predict survival whatever the model used. For homogeneity, we decided to evaluate only web-based models, but other models may provide more accurate information. However, in a recent review, new models using clinical and biological factors also found problems with prediction accuracy in high-risk patient groups (Engelhardt et al, 2014). Some data were missing to complete the models, as the chemotherapy generation, required in the PREDICT and the Adjuvant! models. Then we had to approximate and we chose second generation for all the patients, considering that the data were older than 10 years. If it could be a bias for the real prediction of the outcome, we believe that using the same systematic approximation does not modify the comparison of the three models. Moreover, it represents only 6% of the patients.

Given such limitations, we must remember that web-based models are just one tool, and should not be the only factor used for decision-making for patients. Discrepancies in prediction in high-risk groups must be recognised when considering therapeutic decisions for adjuvant treatments.

However, regarding to the cost and to adverse events of adjuvant therapies, reliable tools must be developed. Genomic signatures are one option to explore, although they often suffer from the same limitations. A good risk-prediction model could incorporate genetic profiles and clinical characteristics. We think that specific predictors are needed in high-risk population, such as young patients, high-grade, HER2-positive or triple-negative tumours.

In conclusion, this study revealed that the three web-based predictors equally perform well at the population level, but exhibit a high degree of discordance in some intermediate-risk group and in the high-risk groups. Consequently, an accurate prediction model in the ER-positive HER2-negative breast cancer subgroup is only available for ‘good prognosis’ patients.