Randomised controlled trial of mammographic screening in women from age 40: predicted mortality based on surrogate outcome measures

A trial in the UK to study the effect on mortality from breast cancer of invitation for annual mammography from the age of 40–41, has randomised a total of 160 921 women in the ratio 1 : 2 to the intervention and control arms. All breast cancers diagnosed in the two arms have been identified, and the histology reviewed. This paper presents the results of an interim analysis using surrogate outcome measures to compare predicted breast cancer mortality in the two arms based on 1287 cases diagnosed to 31.12.1999. Due to earlier diagnosis, there is currently an 8% excess of invasive breast cancers in the intervention arm. The ratio of predicted deaths at 10 years in the intervention arm relative to the control arm, adjusted for this excess diagnosis, ranges from 0.89 (95% confidence interval (CI) 0.78–1.01) to 0.90 (95% CI 0.80–1.01). Screening from age 40 may result in a lower reduction in breast cancer mortality than that observed in other trials including women below age 50. This analysis based on surrogate outcome measures suggests that a reduction in breast cancer mortality may be observed in this trial. However, a number of assumptions have been necessary and firm conclusions must await the analysis of observed mortality from breast cancer.

At the time the NHS breast screening programme (NHSBSP) was introduced in 1988, evidence from the randomised controlled trials suggested that the benefit of screening was restricted to women aged 50 and over, and it was decided to include women aged 50 -64 in the invitation system. The current Age Trial was established to investigate the benefit of screening in younger women, specifically the effectiveness of inviting women annually between the ages of 40 and 47 -48.
The primary outcome measure of the trial is mortality from breast cancer. However, detailed pathology information on all breast cancer cases in the trial is also being collected, in order to permit an earlier analysis of surrogate outcome measures to be performed.

MATERIALS AND METHODS
The methodology of the trial has been described in detail elsewhere (Moss, 1999). Briefly, 160 921 women have been randomised in the ratio 1 : 2 to an intervention arm and to a control arm.
Randomisation is individual, stratified by GP practice. Women in the intervention arm are invited annually for screening by mammography (by two views at first screen, and one view thereafter unless otherwise indicated.) Women in the control arm receive usual medical care. The original protocol to offer each woman seven annual screens was subsequently extended to include invitations up to the calendar year of each woman's 48th birthday. Ethics approval was obtained from London (formerly North Thames) MREC.

Sample size
The trial was designed to randomise 195 000 women aged 40 -41 at entry, in order to have 80% power to detect a 20% reduction in breast cancer mortality at 10 years of follow-up in the intervention arm, at the 5% significance level, using a one-sided test. This was based on an estimated mortality of 3.3 per 1000 in the control arm in women free from breast cancer at entry into the trial (Moss, 1997). In 1999, recruitment was closed at 160 921 women due to difficulties in recruiting new centres, the last women having been randomised in 1997; the revised power is 73%.

Trial population
Recruitment of centres took place between 1991 and 1996 and includes 23 NHSBSP breast screening units in England, Wales and Scotland. Figure 1 shows the number of women randomised by trial arm; a total of 60 women have been excluded from analyses for reasons given in the flowchart. As at March 2002, women in the intervention arm had been offered a mean of 6.6 screens. Uptake of invitation is around 70%, and 84% of women randomised to the intervention arm were still invited at round 5. Details of uptake and screening outcomes are given elsewhere (Moss et al, 2005).
All women are 'flagged' at the NHS central register, which supplies information on all deaths and cancer registrations in the trial population. In addition, information on breast cancer cases is obtained from pathology laboratories and cancer registries and cross matched against the trial population database. Breast cancer cases included in this analysis are those diagnosed from trial entry up to 31.12.1999, a period for which ascertainment of cancers was estimated to be reasonably complete, based on age-specific incidence rates for England and Wales.

Pathology data
For all trial cases identified, the original pathology report and representative histology slides are requested from the relevant laboratory. Each case is reviewed by a panel of three consultant histopathologists, and a consensus diagnosis reached on tumour size, nodal status, grade and histological type.
The review process has been described in detail elsewhere (Anderson et al, 2002). The pathology variables have been combined to calculate different prognostic indices. The Nottingham prognostic index (NPI) includes size, node status and grade according to the formula (0.2 Â size (cm) þ grade (1 -3) þ nodes (1 -3), where 1 ¼ node negative, 2 ¼ 1 -3 nodes positive and 3X3 nodes positive) (Blamey, 1996). The index has previously been validated on series of clinical breast cancer cases using cutoff points for four prognostic groups (Blamey, 1996;Todd et al, 1997). Recently, the groupings have been reformulated for cases diagnosed since 1987 (post use of adjuvant therapy) and five prognostic groups identified; information on the 10-year survival of each group and on the baseline hazard and hazard ratios for individual components have been supplied by the Nottingham group (Sarah Pinder, personal communication).
Two other prognostic indices have also been proposed. The Swedish Two-County Studies Survival index uses information on size, lymph node status, grade and histological type, and has been developed using data from the randomised trial (Tabar et al, 1995b). The relative hazards of different factors, including lymph node status, have previously been validated against observed mortality reduction in the trial in different age-groups including women aged 40 -49 (Tabar et al, 1995a). Updated values for the hazard ratios were obtained for this analysis (Duffy S, personal communication). Assuming a 10-year baseline death rate of 0.04, the hazard was calculated for each cancer, and 10-year survival estimates calculated for quantiles of this hazard in the total data set.
The EDCAT index was developed using data from the Edinburgh Randomised Trial of breast cancer screening, and includes size, histological type, histological grade and node status/number positive (Anderson et al, 2000). Size is included as six categories as opposed to a continuous variable (1 -9, 10 -14, 15 -19, 20 -29, 30 -49 and 50 þ mm). Histological type is classified as 1 ¼ special, 2 ¼ non-special. Grade and nodal status are categorised as for the NPI. An index was formulated based on the hazard ratio from a multivariate analysis including all four factors, and survival associated with four groups (of equal numbers) calculated. A simplified index formula is 0.7 Â size group þ 1 Â type þ 1 Â grade þ 1 Â node group. Information on the baseline  hazard and hazard ratios for individual components have been supplied by the investigators (Alexander F, personal communication). For all the pathology variables in the present study, the proportion unknown varies between the control and intervention arm. However, for nearly all the invasive cases at least one of the variables was known: 1124 using NPI and 1126 using the Edinburgh index. For cases with missing data where at least one variable was known, we predicted the prognostic index for the NPI and Edinburgh index by fitting a regression model using those cases where information was available on all factors. Where all variables are unknown, the mean hazard of cases where at least one factor is missing has been used. The Swedish Two Counties model included a separate hazard ratio for cases where one or more components were missing.

Statistics
Person-years were calculated from date of entry to date of death, date of diagnosis of in situ or invasive cancer, or to 31.12.1999, whichever was earliest.
For each of the prognostic indices, the predicted number of cancers in each arm surviving at 10 years was calculated by multiplying the 10-year survival rates by the numbers of cancers in each prognostic group and summing over all groups. Cancers with an unknown prognostic category were excluded from these calculations.
To take account of the excess of cancers in the intervention arm due to earlier diagnosis, the probability (p) of each screen-detected cancer remaining asymptomatic to the end of the follow-up period (31.12.1999 or date of death) was calculated using the formula p(t4n) ¼ e Àln where 1/l is the mean sojourn time (MST), assuming the sojourn time to be exponentially distributed (Paci et al, 2004). This probability was summed for the screen-detected cancers in each prognostic category, and the total subtracted from the observed number of cancers. A mean sojourn time of 1.0 years for invasive cancers resulted in approximately equal incidence rates in the trial arms; sensitivity analyses were conducted using values of 0.75 and 1.25. When in situ cancers were included (for the analysis using the Swedish Two Counties index), a mean sojourn time of 1.75 years resulted in approximately equal rates; the sensitivity analysis used values of 1.5 and 2.0.
To take account of the effect of lead-time due to advancing the date of diagnosis by screening, the probability of death within 10 years of date of entry/randomisation has been calculated using the hazard ratios for the individual components for each of the three indices. These calculations used Cox proportional hazard models. In the case of the NPI and the Edinburgh index, these were applied to baseline hazard models. For the Swedish Two Counties model, the 10-year baseline death rate of 0.04 was taken and assumed to increase linearly with time. This yielded baseline survival estimates for each year of follow-up. Adjustment for excess diagnosis was made in this analysis by multiplying the probability of death for each screen-detected cancer by the probability of that cancer becoming symptomatic before the end of the follow-up period.
Variance for the relative risk was calculated using the formula given by Day and Duffy (1996) for each prognostic index (adjusted for the difference in the number of subjects in each arm).

RESULTS
Of the 1303 cases identified in the trial population up to 31.12.1999, 1217 have been reviewed by the pathology review panel. For a further 31 the original pathology report was available, while for 55 it has not been possible to trace the original pathology report or hospital. A total of 16 cases included in analyses of breast cancer incidence (Moss et al, 2005) have been excluded here because the pathology review indicated that these were phyllodes tumour (4), sarcoma (1) or malignant lymphoma (1), or downgraded the lesion to 'atypical hyperplasia' (5) or benign (5). This leaves 1287 cases for inclusion in the analysis. Table 1 shows numbers of breast cancers in each arm of the trial by invasive status and size category, by nodal status, grade and histological type for invasive cancers only. A total of 68 cases (50 in the control arm and 18 in the intervention arm) have been classified as 'advanced' by one of the review panels on the basis of information from the pathology report on clinical details (e.g. treatment by chemotherapy of large tumour, cytology/core biopsy only). These are included in the X50 mm category in Table 1. For 47 cases no information on size was available.
The percentage of cases in each of the categories in situ/ o10 mm, node negative and grade 1 is higher in the intervention arm than in the control arm. All these differences are highly significant (Po0.001). For all three factors there is a higher percentage of cases with missing data in the control arm.
Rates per 1000 women-years are given in Table 2 for all cancers and for invasive cancers only. Overall there is an excess of invasive breast cancers in the intervention arm (RR ¼ 1.08, 95% confidence interval (CI) 0.95 -1.21). Including in situ cancers, the excess is 17% (RR 1.17, 95% CI 1.05 -1.32).
Rates of cases X20 mm are 12% lower in the intervention arm than in the control arm (RR 0.88, 95% CI 0.74 -1.05). Rates of node-positive cancers are 11% lower (RR 0.89, 95% CI 0.72 -1.10), but rates of grade 3 tumours are 6% higher (RR 1.06, 95% CI 0.88, 1.27) in the intervention arm. However, none of these differences is statistically significant. Table 3 gives the numbers falling into the different categories of the three prognostic indices, and the predicted numbers surviving at 10 years from date of diagnosis, both for observed cases, and with the intervention arm adjusted for excess diagnosis. (The Swedish two Counties index includes in situ cases, whereas the other two indices are restricted to invasive cancers). Again, for each of the three indices there is a higher percentage of cases in the intervention arm in the best prognostic groups, and the proportion of cancers surviving at 10 years is higher in the intervention arm than the control arm. The percentage predicted to be surviving at 10 years in the intervention arm is lower when the numbers are adjusted for excess diagnosis due to the exclusion of a higher proportion of cases in the categories with better prognosis. Table 4 gives the predicted deaths within 10 years of date of randomisation with the intervention arm adjusted for excess diagnosis in the intervention arm, together with the risk ratios for the intervention arm relative to the control arm.   There is a reduction in predicted breast cancer mortality in the intervention arm relative to the control arm of between 10 and 11% depending on the prognostic index used; the estimated reductions are of borderline statistical significance. The sensitivity analyses give ranges for the relative risk (95% CIs) of 0.89(0.79 -1.00) -0.91(0.81 -1.02) using the NPI, 0.88 (0.79 -0.99) -0.90 (0.81 -1.01) using the Edinburgh index and 0.88 (0.77 -1.00) -0.90 (0.79 -1.02) using the Swedish Two Counties index.

DISCUSSION
The primary outcome measure of this trial is mortality from breast cancer in the two arms. This analysis of surrogate outcome measures predicts a reduction of 10 -11% in breast cancer mortality in the intervention arm of the trial at 10 years from randomisation. The results are consistent for the three prognostic indices that have been used; however, there remains some uncertainty around these predictions.
Estimates of mortality reduction from screening in women under 50 have so far come primarily from subgroup analyses of randomised controlled trials recruiting cohorts of women from ages 40 or 45 upwards; meta-analyses of these trials have estimated a reduction of 18% at an average of 12.7 years of follow-up in women aged 40 -49 at entry (Hendrick et al, 1997), but there remains debate as to how much of this effect is due to screening occurring after age 50 (de Koning et al, 1995). The results of these trials are summarised in Table 5, to allow comparison with the UK trial.
There are a number of possible reasons why the effect of screening on mortality in the current trial may be less than that expected on the basis of other randomised trials. The present trial is unique in inviting a cohort of women annually from age 40, and none of the cancers in the present analysis was diagnosed over the age of 50. Sensitivity of screening is likely to be lower at younger ages, due to increased breast density; our estimate of sensitivity of the first screening round is 74%, and 47 -64% at later screens (Moss et al, 2005); sensitivity in the Swedish Two County study in women aged 40 -49 has been estimated as 72 -83%  and in the Gothenburg trial as 84% (Bjurstam et al, 1997). Uptake at first screen was 70% in our trial, compared to an uptake of over 80% in the Swedish trials. These factors are likely to reduce the impact of screening on mortality.
The attraction of evaluating screening trials using surrogate end points lies in the ability to conduct earlier analyses, and in more precise estimates of mortality reduction, although the latter assumes no error in the survival probabilities. However, there may be limitations to predicting mortality in this trial using the NPI, as it was developed using a clinical series and largely in older age groups. Nevertheless, the coefficients used here for the NPI are those based on post-1987 cases (after the introduction of adjuvant therapy), and this may explain the greater similarity in outcome prediction between NPI and the Edinburgh index observed here than when both were applied to the Edinburgh randomised trial (Anderson et al, 2000). A number of other potential problems with the use of surrogate end points have been discussed by Morrison (1991), such as variation of prognostic factors within measurement categories, most of which would tend to lead to an underestimate of the effect of screening. The analysis of the Swedish Two County Study found that the predicted effect on mortality in the age group 40 -49 years was conservative compared with that observed (Tabar et al, 1995b). The authors hypothesised that this might be attributable to an observed excess of cancers in the intervention arm, possibly due to over diagnosis in this age group, or the existence of a subgroup of women with a long sojourn time. Thus, the mortality reductions predicted here may underestimate the true reduction.
Ideally, predictions based on surrogate outcomes should be based on equivalent numbers of cases in the two arms. In the current analysis, there is an 8% excess of invasive breast cancers in the intervention arm (17% if in situ cancers are included). An excess is likely to remain until women in the control arm are invited for screening as part of the national programme at ages 50 -52. The adjustment to predict mortality at 10 years from the date of randomisation as opposed to 10 years from the date of diagnosis will take account of lead-time bias due to the advancement in date of diagnosis of screen-detected cancers. We have attempted to adjust for excess diagnosis, and the effect of this adjustment does not vary greatly with different assumptions concerning the mean sojourn time. Analyses without this adjustment result in lower estimates of mortality reduction ranging between 4 and 6% (data not shown).
There is a 12% reduction in the rate of cancers X20 mm in the intervention arm (equivalent to an 18% reduction adjusted for excess diagnosis), and a 11% reduction in node-positive rate (equivalent to a 17% reduction adjusted for excess diagnosis); both these measures have been shown to have a direct relationship with actual relative mortality reduction in randomised trials .
In conclusion, results so far suggest that a reduction in breast cancer mortality in the trial is likely to be observed; however, the size of the reduction is uncertain and awaits definite results on mortality. The first such analysis will be performed when data are available on a mean follow-up of 10 years. Comparison of observed and predicted mortality reductions in this trial (and in the frequency trial (The Breast Screening Frequency Trial Group, 2002)) may provide further insight into the application of surrogate outcome measures. When all women in the control arm have been invited for their first screen at ages 50 -52, we should be in a position to make a more accurate prediction of the long-term effect on mortality. , administrative staff at relevant Health Authorities and staff at pathology laboratories, cancer registries and regional quality assurance centres. Contributions from staff at the Trial Coordinating centre are gratefully acknowledged: Derek Coleman, Louise Johns and Nicola Kingston for data processing and analysis, Nicola Bixby and Yvonne Berg for case identification and pathology material collection. The work of the pathologists carrying out the review (together with Prof Tom Anderson), Dr Ian Ellis, Prof John Sloane (deceased) and Dr Lynda Bobrow, is also gratefully acknowledged.