Main

The evidence for reduction in breast cancer mortality associated with screening mammography is based on a number of randomised trials conducted in the 1960s, 1970s and 1980s of individuals or populations invited and not invited to screening. Screening recommendations have been based on the results of such trials from Sweden (Tabár et al, 1992; Frisell et al, 1997), Edinburgh (Alexander et al, 1994), New York (Shapiro, 1997) and Canada (Miller et al, 1992). Meta-analyses of these trials have suggested reductions in breast cancer mortality in 24–31% of those screened (Kerlikowske et al, 1995; Nystrom et al, 1996; Nelson et al, 2016) and 20–30% in those invited to screening (Tabár et al, 2001, 2003). Meta-analyses that exclude several studies because of possible randomisation bias have failed to show an effect of mammography screening on breast cancer mortality (Gøtzsche and Olsen, 2000; Gøtzsche and Nielsen, 2006, 2009, 2011; Gøtzsche and Jørgensen, 2013). This result is due almost entirely to a negative finding from the Canadian breast screening trial (Freedman et al, 2004). A review by the International Agency for Research on Cancer (IARC) produced a meta-analysis that indicated a pooled relative risk (RR) of 0.75 (25% breast cancer mortality reduction) for invitation to mammography screening in women aged 50–69 years (International Agency for Research on Cancer, 2002, 2016).

Randomised trials based on invitation to screening (as an ‘intention-to-treat’ analysis) may underestimate the benefit of screening participation because of, inter alia, non-adherence in the intervention group and screening in the control group. The effect of actual screening on breast cancer mortality among screening participants has been estimated to be a 35% reduction compared with those not screening (International Agency for Research on Cancer, 2016). However, this effect may be biased by self-selection into screening, in that those who screen may also have lower breast cancer mortality for non-screening reasons (Duffy et al, 2002a).

Service studies of mammography screening using a variety of methodologies in the United Kingdom (Quinn and Allen, 1995; UK Trial of Early Detection of Breast Cancer group, 1999; Blanks et al, 2000; Threlfall et al, 2003), Holland (Broeders et al, 2001; Otto et al, 2003), Finland (Anttila et al, 2002), Sweden (Jonsson et al, 2001; Tabár et al, 2001, 2003; Duffy et al, 2003) and Australia (Taylor et al, 2004, 2009; Roder et al, 2008; Morrell et al, 2012; Nickson et al, 2012) have indicated lower mortality associated with screening compared with non-screened populations, although not all results reached statistical significance. Australian studies have shown significant breast cancer mortality reductions associated with screening mammography using a variety of study designs and analytical approaches (Taylor et al, 2004, 2009; Roder et al, 2008; Morrell et al, 2012).

The purpose of population-based mammography screening programmes is to reduce mortality from breast cancer, and such benefits associated with established screening programmes, including that of New Zealand, need to be evaluated to determine whether this purpose is being achieved in a real-world setting. Other benefits include less extensive surgical treatment, radiotherapy and cytotoxic and other pharmaceutical treatment for cancers that are diagnosed at an earlier stage. This needs to be balanced against the potential harms associated with screening mammography, including false positives and subsequent unnecessary investigations, and possible overdiagnosis and accompanying overtreatment (Marmot et al, 2013). It is thus important that the extent of breast cancer mortality benefit from established screening be quantified. In the context of an established evidenced-based programme, it is not possible or appropriate to conduct a randomised controlled trial (RCT) of mammography screening to assess its impact on breast cancer mortality. Numerous observational study designs are available to assess breast cancer mortality in relation to participation in the BreastScreen Aotearoa (BSA) programme (Aotearoa being the Indigenous Māori name for New Zealand).

The BSA program commenced operations in December 1998 targeting women aged 50–64 years, with age extended to 45–69 years in 2004. Full geographic coverage was achieved in 1999. The screening interval is two-yearly and two-view bilateral mammograms are performed and read independently by two radiologists, with arbitration by a third or consensus group for conflicting findings. Recruitment to screening is by general health promotion activities, and retention to screening is by personalised reminders to screened women. Digital mammography was first implemented in February 2006 and mammography screening was completely digital by 2013. Cancer detection rates have been within expected (40 per 10 000 women screened), and recall-to-assessment rates also within the quality assurance guidelines (Page et al, 2014). Biennial screening attendance rose from 54% for all women (35% for Māori and Pacific women) in the early years of the program, and by 2011 it had risen to 71% for all women and Pacific women, and 63% for Māori women (Page et al, 2014).

Given the quality and universality of the National Health Index (NHI) linkage key in New Zealand health data, an historical population cohort study of individuals is feasible and was considered to provide the strongest observational study design and largest numbers to assess the effectiveness of the BSA programme. Individual-level screening participation and breast cancer incidence and mortality outcome data were available. A population cohort study offers advantages over a case–control design, in that it circumvents questions concerning selection and appropriateness of controls. The main disadvantage of an historic population cohort design is bias from loss to follow-up, or attrition, especially from outmigration, and inaccuracies in data linkage. However, the procedure in New Zealand for health data linkage is well established and likely to provide reliable results with only a small proportion of mismatches (C Lewis, personal communication).

The aim of this study is to provide an indicator of screening performance in New Zealand based on real-life monitoring data; it is not intended to constitute the underpinning scientific justification for screening, which was established by the RCTs. The hypotheses investigated are that breast cancer mortality is lower in ever-screened women compared with never-screened women, and in women with more compared with less regularity of screening, and that ever-screened women will have prognostic factors at diagnosis of breast cancer indicative of a more favourable outcome than never-screened women.

Materials and methods

Study design and data

This is a retrospective cohort study of breast cancer mortality in New Zealand women in relation to screening mammography. The New Zealand NHI is used to link individual data from the BSA screening service and cancer and death registries, to assemble a cohort comprising all New Zealand women aged 45–69 years during 1999–2011 who were ever screened or diagnosed with breast cancer. Never-screened women not otherwise linkable via the NHI in a given year are inferred by subtraction from ethnic- and age-specific census-derived populations for that year, as provided by Statistics New Zealand (Table 1).

Table 1 Cohort populations, annual breast cancer mortality from breast cancers diagnosed in 1999–2011, and person-years of exposure in ever- and never-screeneda women aged 45 years, by year of death

Analytic approach

From 2000 to 2011, breast cancer mortality in each year was calculated in relation to screening participation or non-participation measured in person-years cumulated individually up to the beginning of each year. Breast cancer mortality occurring originates from cumulated prior breast cancer incidence from 1999 onwards. Women were followed from the time they first became eligible to screen, as a ‘screening-inception’ cohort with differing person-years based on participation and non-participation in BSA mammography. In ever-screened women, person-years of participation in screening are calculated from the time of first screen to the beginning of each successive year, or to the year of diagnosis for women diagnosed with breast cancer–since screening participation postdiagnosis is not relevant to breast cancer mortality in this study. Every woman for whom individual data were available was classified as not screened from the time they first became eligible to screen (i.e., at age 50 years before 2004, and at 45 years from 2004) to the time they first screened. This period contributed to the woman’s person-years of not screening with BSA. The time from her first screen to either: (i) the end of the study period; (ii) the woman’s death; or (iii) to the first diagnosis of breast cancer; counted as person-years since first participating in screening. Thus, at the beginning of each year, each woman may contribute some person-years of cumulated participation and non-participation in screening depending on their age, when they first became eligible to screen, when organised screening commenced and the screening target age range over the period of interest.

For each successive year, cumulated person-years of participation and non-participation in screening (while eligible) are recalculated taking into account women newly screened since the previous year, those who become eligible by ageing into the screening target age group in that year and those ageing out of the screening target age group in that year (along with women diagnosed with breast cancer or who have died). For 1999–2003, person-years of participation or non-participation in screening are calculated for the target group of women aged 50–64 years. From extension of the target age range in 2004, the calculation is for women aged 45–69 years. The overall approach is illustrated in Figure 1. For the remainder of this article, we use the term ‘screening exposure’ synonymously with ‘participation in screening’.

Figure 1
figure 1

Individual examples of person-year contributions to screening exposure and non-exposure under a variety of screening participation scenarios. The total cumulated person-years of participation and non-participation in screening up to the end of 2003 (vertical line) in a hypothetical cohort of 20 women aged 40–65 years in 1999. Cumulated person-years contributing to participation is 18, and the total cumulated person-years from first eligibility contributing to screening non-participation is 30.

For women with no recorded screening participation, their person-years of non-participation in screening were calculated as above. For women without individual data (i.e., the remaining female population with no recorded screening or breast cancer history), the screening commencement age was subtracted from the median age in each 5-year age group for a given year (supplied in 5-year age groups). Person-years formed the denominators for subsequent analyses of those ever- or never-screened up to the given year of interest.

The outcome variable is breast cancer mortality occurring over 2000–2011 originating only from cancers diagnosed during 1999–2011 in women in the screening target age groups. With this approach, lead-time bias does not affect the estimates of breast cancer mortality risk in screened compared with unscreened women, as breast cancer mortality is examined in relation to time since first participating in screening, and time not participating in screening from the time of first screening eligibility, not time from diagnosis as in a clinical cohort from incident cases.

Mortality analyses

Relative risks for breast cancer mortality are determined from comparison of mortality in those ever screened compared with the never screened (RR=1.00), and adjusted by regression analysis for confounding by 5-year age group at death (45–49 to 75–79 years) and ethnicity: Māori, Pacific and Other women. Māori are the indigenous people of New Zealand and comprise 9.6% of the female population aged 45–69 years (2006 Census). Pacific women comprise migrants or their descendants from Pacific Islands states, especially from Samoa, Tonga, Cook Islands and Niue who comprise 3.9% of the 45–69-year female population (2006 Census). The main (‘other’) component of the population consists predominantly of New Zealanders of European descent. Ethnicity in New Zealand is based on self-designation, and in this study ever designated as Māori (or Pacific) from any of the ethnicity indicator variables available across data sources (‘prioritised’ ethnicity) is used to attribute ethnicity. Adjustment for confounding by age group and ethnicity is to account for differences in distribution of such groups with variation in breast cancer mortality (from reasons other than screening) in those never screened, ever screened, irregularly screened or regularly screened.

Analyses are undertaken of effects of screening regularity defined as: screened at least three times with a mean screening interval of 30 months or less, as previously used in a South Australian case–control study (Roder et al, 2008). Irregular screening is defined as ever screened, but not conforming to this definition of regular screening. These categories were compared with breast cancer mortality in the never screened. To test for a screening dose–response relationship, the statistical test of significance involved modelling screening regularity as an ordinal covariate (with a single degree of freedom) in a regression analysis with the significance of the β-estimate serving as the test of trend across never-, irregularly- and regularly-screened women.

Statistical modelling was by negative binomial regression adjusting for repeated measures, as breast cancer mortality in a largely similar population was analysed repeatedly each year to account for changing screening status in the population. Where negative binomial models failed to converge, Poisson regression is used and standard errors corrected for overdispersion. Counts of breast cancer deaths are the outcome and the offset is the log of person-years of exposure or non-exposure to screening. With increasing time, the total person-years of exposure and non-exposure to screening cumulate; correspondingly, annual breast cancer mortality counts derive from a cumulation of incident breast cancer cases diagnosed over the same time period preceding the given year of death, as occurs in all incidence-based analyses of breast cancer mortality. SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) is used for statistical analyses.

Screening selection bias

Relative risks are further adjusted for screening selection bias, which results from comparison of breast cancer mortality in screened women compared with those who do not participate in screening despite its availability (Duffy et al, 2002a). In some populations, the latter have been shown to have higher breast cancer mortality compared with women not offered screening, some of whom would screen if it were available. Such comparisons can thus inflate estimates of mortality reduction from screening compared with a population never offered screening, and corrections have been used frequently in previous studies (Duffy et al, 2002b; Gabe and Duffy, 2005; Swedish Organised Service Screening Evaluation Group, 2006) to produce estimates of screening effects that are comparable to results of randomised trials. The correction relies on: (1) empirical estimates from RCTs and screening service studies of breast cancer mortality RR in women not screening when offered compared with women not offered screening (Dr); (2) an estimate of population-based screening participation (p); and (3) the empirical RR estimates of never- vs ever-screening derived from the Poisson or negative binomial modelling (RRder) from the present study of New Zealand women. The resulting adjusted RR estimate is an intention-to-treat estimate and represents the RR in a population where screening is available or offered compared with a population where screening is unavailable or not offered. In detail (Duffy et al, 2002a):

The adjusted RR is

The variance for RRadj is calculated from

The standard error is then

The upper and lower 95% confidence intervals of ln(RRadj) are

and these are exponentiated to produce the upper and lower 95% confidence intervals of RRadj.

The excess breast cancer mortality of women not screening over women not offered screening, reported in the literature and used in the present study, is an RR of 1.17, from the Swedish screening service studies (Swedish Organised Service Screening Evaluation Group, 2006) considered appropriate for evaluation of the New Zealand screening program because it emanates from a service screening environment similar to New Zealand. The variance of ln(Dr), (V{ln(Dr)}), is estimated as 0.0014995, derived from the 95% confidence interval reported for the overall estimate of Dr by the Swedish Organized Service Screening Evaluation Group (Swedish Organised Service Screening Evaluation Group, 2006) and V{ln(RRder)} is provided directly by the regression outputs from the present analyses, as the standard error of the regression estimate squared.

Adjustment for screening selection bias also incorporates screening coverage (Duffy et al, 2002a). Accordingly, adjusted estimates for different screening participation rates are derived from the mean recorded participation rate for: 2001–2011 (64%), the most recent period 2012–2013 (71%; Page et al, 2014), and the screening participation target of 70%.

Prognostic indicators

Prognostic indicators at diagnosis of breast cancer are compared between never- and ever-, and between irregularly- and regularly-screened groups. Prognostic indicators included grade of tumour, extent of disease (spread), multiple tumours and maximum tumour size. To assess possible downshifting of cancer stage from screening, partly attributable to possibly inconsequential cancers, examination of proportions of cancers with regional or distant spread is also undertaken. In particular, the proportions of distant cancer of distant+regional cancer were compared between the screening comparison groups to eliminate differences in proportions of distant cancer being attributable only to a screening-related inflation of localised cancer.

Chi-squared, two-sample median and t-tests for differences in proportions, medians and means, respectively, are used for testing ever- and never-screened differences in prognostic indicators.

Results

Breast cancer mortality reduction in relation to mammography screening

Breast cancer mortality for 2000–2011 (from cancers diagnosed from 1999) was 23.5 per 100 000 for ever screened (873 deaths, n=3 707 484) and 65.0 per 100 000 for never screened (deaths=3511, n=5 405 518). The unadjusted mortality ratio is thus 0.36, indicating a mortality reduction from screening of 64% (Table 1). Using person-years as the denominator and adjusting for age and ethnicity by negative binomial regression, breast cancer mortality in ever-screened women is estimated to be 62% (95% CI: 51–70) lower compared with that in never-screened women based on the RR adjusted for age and ethnicity (Table 2). After further adjustment for screening selection bias, the mortality reduction is estimated to be 29% (95% CI: 20–38) at average screening participation of 64% for 2001–2011. For recent (2012–2013) screening coverage (71%), the estimated mortality reduction is estimated as 34% (95% CI: 25–43).

Table 2 Adjusted relative riska of breast cancer mortality in ever- and never-screenedb New Zealand women, 1999–2011

Compared with never-screened women, breast cancer mortality is estimated to be 58% (95% CI: 48–66) lower in irregularly screened women, and 67% (95% CI: 46–81) lower in regularly screened women compared with never-screened women based on the RRs (Table 3). The indicator of regularity of screening in this analysis is screened at least three times with a mean screening interval of 30 months or less. Age at first screen, to indicate earlier or later age at first exposure to screening, is controlled for, along with ethnicity, to minimise confounding.

Table 3 Adjusted relative riska of breast cancer in regularly, irregularly and never-screenedb New Zealand women, breast cancers diagnosed 2003–2011

After adjustment for screening selection bias, the mortality benefit in women screened less regularly is estimated to be 26% (95% CI: 17–35) for screening participation of 64% for 2001–2011. Based on the most recent (2012–2013) screening rate of 71%, the mortality reduction is estimated to be 31% (95% CI: 21–40), similar to that for the screening participation target of 70%. The mortality benefit attributable to regular screening, adjusted for screening selection bias, is estimated as 33% (95% CI: 18–45), based on screening for 2001–2011, and 39% reduction based on the 2012–2013 screening coverage (71%). The test for trend across screening groups was significant (P<0.0001), confirming a screening dose–response relationship in breast cancer mortality (Table 3 and Figure 2).

Figure 2
figure 2

Differences (%)† in breast cancer mortality by mammography screening group, New Zealand women, 1999–2011. Adjusted for age and ethnicity by regression; and adjusted for screening selection bias (Duffy et al, 2002a) assuming relative risk in non-screeners to women not offered screening=1.17 and recorded screening participation rate of 71% for 2012–2013. §Trend test of regression estimates.

Prognostic factors in diagnosed breast cancer in relation to mammography screening

Ever-screened women diagnosed with breast cancer had more favourable prognostic indications than never-screened women, on all indicators in 2000–2011 (Table 4). A significantly higher proportion of ever-screened women diagnosed with breast cancer had well-differentiated tumours (30%) compared with 18% of never-screened diagnosed women (P<0.0001), and 63% of diagnosed ever-screened women had localised cancer compared with 46% in never-screened women (P<0.0001). These proportions and their differences are similar when recalculated excluding unknown or not recorded categories.

Table 4 Prognostic indicators for breast cancers diagnosed in ever- vs never screeneda women, New Zealand women aged 45–69 years at year of diagnosis, 2000–2011

The proportion of breast cancers with distant spread, of distant plus regional spread, was 4.9% in ever-screened women and 11.3% in never-screened women, which equates to a corresponding RR for ever-screened women of 0.44 (95% CI: 0.37–0.52). This indicates that the lower proportion of distant cancer in ever-screened women was not entirely attributable to screening-related inflation of localised cancer.

Of ever-screened women diagnosed with breast cancer, 1.8% had multiple tumours compared with 3.8% of never-screened women, or RR=0.48 (95% CI: 0.41–0.56) compared with never-screened women (RR=1.00). The median maximum tumour size in ever-screened women with breast cancer was 15 mm compared with 20 mm in corresponding never-screened women (P<0.0001, significantly smaller). Excepting grade of tumour, there were significantly better prognostic indicators evident in regular screening than in irregular screening for the degree of spread (localised 67 compared with 60%), multiple tumour status (RR=0.57) and maximum tumour size (14 mm vs 15mm median) (Table 5).

Table 5 Prognostic indicators for breast cancers diagnosed in regular vs irregularly screeneda women, New Zealand women aged 45–69 years at year of diagnosis, 2003–2011b

Discussion

All initial hypotheses proposed in this study were confirmed. Breast cancer mortality in New Zealand was lower in women ever screened by BSA compared with women never screened by BSA during 1999–2011, and lower in women with higher compared with lower regularity of screening; and ever-screened women show more favourable breast cancer prognostic factors than never-screened women. There is an evident dose–response relationship in breast cancer mortality reduction from never, to irregular, then to regular screening mirrored by differences in prognostic indicators in diagnosed cancers. Further, the documented mortality reductions from service screening (adjusted for screening selection bias) in this study are similar to the findings of meta-analyses of RCTs (Kerlikowske et al, 1995; Nystrom et al, 1996; Tabár et al, 2001, 2003; International Agency for Research on Cancer, 2016; Nelson et al, 2016) and previous studies of service screening (Broeders et al, 2012).

It is evident that the main source of the breast cancer mortality benefit in ever-screened women is the earlier detection of cancer, as indicated by more favourable prognostic indicators than in never-screened women. In particular, the mortality benefit from screening has stemmed largely from higher proportions with localised summary stage at diagnosis, and correspondingly lower proportions with regional or metastatic spread in the ever-screened compared with the never-screened women. Importantly, cancer with distant spread as a proportion of cancer with distant or regional spread at diagnosis is also significantly lower in ever-screened compared with never-screened women, which indicates an absence of artefactual stage shifting from detection by screening of possibly inconsequential subclinical cancers.

Tumour size, tumour grade, nodal status and degree of spread are inter-related. Well-differentiated cancer (grade) in a significantly higher proportion of ever-screened than never-screened women with cancer has been shown to be correlated with smaller tumour size (Duffy et al, 1991). Higher proportions of smaller tumour size and lower tumour grade, or less dedifferentiated cancer, in screened women compared with non-screened women, has been shown to be a consequence of early diagnosis rather than length bias (Duffy et al, 1991). The mortality findings are consistent with the prognostic indicators, and the evidence for a dose–response relationship is strong, with breast cancer mortality being lowest in regular screened women, higher in irregularly screened women and highest in never-screened women–consistent with the prognostic indicators.

The most direct and understandable outcome measure for this analysis is breast cancer mortality occurring each year, based on screening exposure up to the beginning of that year in women not diagnosed with breast cancer, and up to the beginning of the year of diagnosis in women with breast cancer. Some women classified as never screened up to a particular year will subsequently become screened. The subsequent screening exposure, along with past non-screening exposure, will be relevant to subsequent breast cancer mortality, weighted by the time in each exposure category. This approach is not subject to lead-time bias as the screening exposure and non-exposure are measured as person-years from time of screening eligibility, and time of diagnosis of breast cancer is not used in the analysis. Breast cancer mortality analysed here is confined only to those breast cancer cases diagnosed after the advent of population screening mammography (1999).

While survival analysis of breast cancer cases based on time since diagnosis can be adjusted for competing causes of death and other factors, controlling for lead-time bias due to screening is exceedingly difficult. The present study has avoided artefactual contributions from lead time by not examining survival postdiagnosis but rather breast cancer mortality in relation to person-time exposed to screening up to the beginning of each successive year. Breast cancer mortality over that year is then examined. This process was repeated for each year successively, in a repeated-measures analysis of yearly breast cancer deaths.

As in most screening service studies, which by nature are observational, the factors contributing to differences in breast cancer mortality between those who participate in screening compared with those who do not cannot all be known or measured. Compared with women who do not screen despite its availability, it is possible that women who do screen when screening is available also have other (unmeasured and unknown) characteristics that may contribute to lower breast cancer mortality (in addition to screening itself). Although publicly funded screening and treatment services are available nationwide in New Zealand, there is some evidence of differential access or quality of care by screen-detection status and ethnic group. Surgery delays have been reported as more common in non-BSA diagnosed women and among Māori, and surgery delay in the public sector was also found to be more likely than in the private sector (Seneviratne et al, 2015; Lawrenson et al, 2016). Māori women have also been observed as less likely to have radiotherapy and/or adhere to long-term adjuvant endocrine therapy, and more likely to have a mastectomy, than non-Māori women (Lawrenson et al, 2016).

Screening selection bias may also be affected in New Zealand by initial recruitment to screening through media campaigns and other health promotion activities, conducted by the Ministry of Health, rather than individual, personalised recruitment. Women more likely to respond to general health promotion activities may have lower risk of breast cancer mortality through lower breast cancer incidence and/or case fatality. While lower case fatality depends on lower cancer stage and grade, the main mechanism through which screening mammography lowers breast cancer mortality, as shown in the present study, is that breast cancers in screened women have better prognostic indicators than those in unscreened women. That is, lower case fatality is shown to be consistent with earlier detection through screening.

The principal advantage of adjustment for screening selection bias is that it accounts for possible differences between women who screen compared with women who do not screen, despite the availability of screening. An additional advantage is that this adjustment produces an estimate of the effects of screening mammography on breast cancer mortality in a population offered screening compared with a population not offered screening, to relate findings from screening service evaluation studies to RCTs in which entire populations offered screening are compared with populations not offered screening. This correction has been used extensively in published studies of screening evaluations, both in cohort studies (Tabár et al, 2001; Duffy et al, 2002b; Gabe and Duffy, 2005; Swedish Organised Service Screening Evaluation Group, 2006; Lawrence et al, 2009; Hofvind et al, 2013) and in case–control studies (Allgood et al, 2008; Puliti et al, 2008; Roder et al, 2008; Nickson et al, 2012; Otto et al, 2012; Paap et al, 2014; van der Waal et al, 2015).

A potential weakness in the present analysis is the use of mortality differentials from Swedish service screening studies to adjust for screening selection bias. The Swedish screening service studies found the RR of breast cancer mortality in women not screening, in spite of it being offered, compared with women not offered screening, to be 1.17 (Swedish Organised Service Screening Evaluation Group, 2006). This RR estimate may be different to the New Zealand population, although the estimates of breast cancer mortality unadjusted for screening selection bias indicate that the anticipated direction of the effect of screening selection bias is correct, and the consequences of its application produces results that have face validity compared with data from the original trials. We have used the Swedish RR estimate in our study because of its ready availability and that it derives from screening programs in Sweden and New Zealand with apparent similarities. The aim of this study was to provide an indicator of screening performance in New Zealand based on real-life monitoring data; it is not intended to constitute the underpinning scientific justification for screening that was established by the RCTs.

Published evaluations of mammography service screening include case–control and cohort studies, although several cohort studies use a quasiexperimental study design with categorical exposure to screening (or non-screening) in aggregate for all individuals, such as before and after studies in the same population, or contemporaneous comparisons of populations in different geographic areas. Although the outcome is measured as a cohort mortality rate, the exposure is ecological. Such studies are susceptible to bias and confounding because of different characteristics of comparative populations that usually are not randomly selected.

Cohort studies of service screening using screening exposure measured in individuals have been conducted in Finland (Hakama et al, 1997; Anttila et al, 2002), Denmark (Olsen et al, 2005) and Sweden (Duffy et al, 2002b). The Finnish studies, focusing on women aged 50–59 years, used linked screening, cancer and death registry data to examine breast cancer mortality outcomes. Since screening was implemented in Finland in different municipalities at different times, with women invited according to even or odd year of birth, such quasirandomised cohorts can be compared.

A Finnish study, published in 1997, of breast cancer mortality from cancers diagnosed only after the implementation of screening found 24% (95% CI (RR): 0.53–1.09; marginally nonsignificant) lower breast cancer mortality in women invited to screening during 1987–1989 compared with those not invited (Hakama et al, 1997). For invited women aged under 56 years, breast cancer mortality was found to be 44% (95% CI: 0.33–0.95) lower (significant). This study was not affected by screening selection bias.

A later individual-based Finnish study (Anttila et al, 2002) of service screening in the Helsinki, published in 2002, compared birth cohorts not offered screening (born 1930–1934), with birth cohorts offered screening who were exposed to most screening rounds with the longest follow-up time (born 1935–1939). This study found 19% (95% CI: 0.62–1.05; marginally nonsignificant) lower breast cancer mortality in the screened cohort after adjusting for screening selection bias (Anttila et al, 2002). When the Finnish results were combined in a meta-analysis, breast cancer mortality in women invited to screen was estimated to be 23% (95% CI (RR): 0.40–1.00) lower than uninvited women (statistically significant) (Irvin and Kaplan, 2014).

A Danish cohort study of Copenhagen women aged 50–69 years with individual measurement of screening exposure for the first decade of screening (1991–2001), with up to 10 years follow-up, found 25% (95% CI: 0.63–0.89) lower breast cancer mortality in those invited for screening, compared with women not invited for screening (Olsen et al, 2005). The result was derived from a contemporaneous national cohort of uninvited Danish women, historical cohorts of Copenhagen women in the decade before screening and historical cohorts of Danish women. Breast cancer mortality was found to be 37% (approximate 95% CI: 0.52–0.77) lower in women who actually screened compared with those not screening, and after adjusting for screening selection bias by Gabe and Duffy (2005), it was 30% lower.

In a follow-up of mammography screening in Florence (Italy), breast cancer mortality was estimated to be 19% lower (borderline significant) in invited women diagnosed with breast cancer between 1990 and 1993 (follow-up to 1999) compared with that in women not yet invited to screening (Paci et al, 2002b). However, breast cancer mortality in ‘uninvited’ women was estimated by application of case fatality rates to breast cancers expected to be diagnosed in the population not yet invited to screening.

In a Swedish study (Duffy et al, 2002b) published in 2002, breast cancer mortality in invited women was found to be 30% lower than in uninvited women from cancers diagnosed entirely in the screening epoch (comparable to the present study). Breast cancer mortality in women who actually screened was found to be 39% lower compared with that in women who did not screen (statistically significant), after adjusting for screening selection bias (Duffy et al, 2002b).

From a systematic review by Gabe and Duffy (2005), results of both aggregate and individual-based cohort studies were meta-analysed and breast cancer mortality RR was estimated as 0.74 (26% lower) for invitation to screening, after adjustment for potential confounding and bias (11 studies; Peer et al, 1995; Dijck et al, 1997; Hakama et al, 1997; UK Trial of Early Detection of Breast Cancer Group, 1999; Jonsson et al, 2000, 2001; Anttila et al, 2002; Paci et al, 2002a; Duffy et al, 2002b; Tabár et al, 2003; Olsen et al, 2005). For actual screening, breast cancer mortality RR of 0.57 was estimated (mortality reduction 43%; five studies; Gabe and Duffy, 2005). After adjusting further for screening selection bias, a combined 32% breast cancer mortality reduction was found associated with screening attendance (from 10 studies; Gabe and Duffy, 2005). However, it was not stated explicitly which value of RR for unscreened relative to uninvited women was used for adjusting for screening selection bias.

Since the Gabe review (2005; Gabe and Duffy, 2005), similar results were produced from an individual-based cohort study in Norway of 50–79–year-old women over 1986–2009 (Weedon-Fekjær et al, 2014) showing 28% lower breast cancer mortality in women invited to screening than in women not invited, after adjusting for confounders. Another Norwegian study comparing cohorts in early- and late-starting mammography screening areas found somewhat smaller and nonsignificant differences using differing historical and/or regional controls (RR range 0.89–0.93). However, national survey data indicated that 40% had already participated in regular screening before the introduction of screening mammography in Norway (Olsen et al, 2013), and 64% of first attendees reported having a mammogram before participating in the organised screening program (Lynge et al, 2011; Olsen et al, 2013).

A further meta-analysis of quasiexperimental aggregate studies (Irvin and Kaplan, 2014), including the Norwegian study above (Weedon-Fekjær et al, 2014), with incidence-based breast cancer mortality as the outcome for women age 50–69 years, showed breast cancer mortality to be: 43% lower in historical comparisons (two studies; Swedish Organised Service Screening Evaluation Group, 2006; Ascunce et al, 2007); 22% lower in geographical comparisons (two studies; UK Trial of Early Detection of Breast Cancer Group, 1999; Jonsson et al, 2007); and 13% lower in geographical–historical comparisons (five studies; Jonsson et al, 2001, 2003; Olsen et al, 2005, 2013; Parvinen et al, 2006). These meta-analysed estimates were all statistically significantly different from zero breast cancer mortality difference.

An aggregate cohort evaluation of BreastScreen Australia for 1990–2004, using small area incidence-linked breast cancer mortality, correlated with lagged mammography screening participation rates in each area and each year (Poisson regression analysis), or breast cancer mortality subsequent to screening in these areas and years (Cox proportional hazard regression), found 25–34% lower breast cancer mortality associated with screening mammography (respectively), adjusted for confounders when projected to the target screening participation of 70% (Taylor et al, 2009; Morrell et al, 2012).

Data linkage for BSA has allowed an unprecedented examination of the efficacy of screening mammography in a real-world population setting. While mismatches based on the NHI are rare, the extent of duplicate (i.e., different) NHIs applying to the same person across different data sources is not well established; however, records are de-duplicated when discovered (Lewis, 2016 personal communication). Accordingly, the effect on screening estimates derived in the present study may bias them away from the null. For instance, a woman with one NHI recorded on the BSA screening register and another NHI recorded on the death and cancer registers would be misclassified as ever-screened and still alive, thus favouring the ever-screened. However, the likelihood of these exceptions is very low (Lewis, 2016 personal communication).

Apart from the Finnish, Danish and Swedish studies, the authors are not aware of other cohort studies of established screening mammography that have used individual-based information on screening exposure linked with cancer diagnosis and mortality from breast cancer. Further, to our knowledge no previous individual cohort studies have taken account of changes in screening exposure during follow-up of study populations, and none have compared regular screening with less frequent screening and never screening to examine a dose–response effect.

It is probable that a small proportion of women classified as never-screened with BSA may have screened privately, or had participated in screening pilots conducted in the early 1990s and have not subsequently screened with BSA. The extent of private opportunistic screening in New Zealand outside BSA is difficult to determine because private mammography is not subsidised or provided without charge, and is paid for by the individual or their insurance company. Consequently, it would be expected that de facto or previous pilot screening in never-BSA screened women, who would be misclassified as never screened, would bias any mortality benefits found in screened women toward the null.

While case–control studies do not provide the strongest evidence for causation, they use only exposures and outcomes measured in individuals. For this reason, they are not subject to the same levels of misclassification of exposure that can beset aggregate studies or quasiexperimental cohort studies. An advantage of case–control studies over cohort studies is they are not affected by attrition, which can produce biased estimates of the effect if loss to follow-up is systematically associated with exposure to the study factor.

When comparing screened with unscreened women, results from case–control studies are consistent with those found in the present study. In the review by Gabe and Duffy (2005), of seven case–control studies, the estimated odds ratios (OR) of breast cancer death in screened compared with unscreened women ranged from 0.42 to 0.75. A case–control study in South Australia found OR of 0.59 (95% CI: 0.47–0.74) for screening participants compared with non-participants, which was 0.70 when corrected for screening participation bias, similar to the present study (Roder et al, 2008).

There is some disagreement over the relative roles in the population breast cancer mortality decline of organised mammography screening, compared with breast cancer treatment improvements, especially addition and/or extended use of chemotherapy and tamoxifen therapy to surgery for the treatment of primary breast cancer (Early Breast Cancer Trialists’ Collaborative Group (EBCTCG), 2005; Burton et al, 2012). This coincided with the advent of mammography screening programs in many Western countries around 1990. Nonetheless, service studies of mammography screening in Sweden and the Netherlands found significant mortality reductions coincident with the introduction of mammography screening that preceded widespread changes in the primary treatment of breast cancer (Tabár et al, 2001, 2003; Otto et al, 2003). In the case of New Zealand, the implementation of screening mammography (1999) occurred after most of these treatment improvements had become widespread (late 1980s–early 1990s), and lower breast cancer mortality has nonetheless been found to be associated with screening.

To conclude, this study has shown that screening mammography in New Zealand is associated with significantly lower breast cancer mortality for ever-screened women compared with women who have never screened with BSA; that there is evidence of a dose–response relationship between greater screening exposure and lower breast cancer mortality; and that the breast cancer mortality reductions are consistent with prognostic indicators in relation to screening exposure, and with evidence from RCTs and other service studies of screening effectiveness.