Background

The majority of population-based organised breast screening programmes invite women aged 50-69 years to participate in mammography screening. This is based on evidence from randomised controlled trials (RCTs) which show that detection of breast cancer at an early-stage through mammography screening leads to a reduction in breast cancer mortality in this age-group [1]. Evidence for screening women 70–74 years comes mainly from observational and modelling studies [2, 3] and there is limited additional evidence from the RCTs, with some programmes expanding the target age-group for screening to include women up to 74 years of age [4]. For women older than 74 years, there is no trial evidence for the benefits and hence no specific guidance regarding the net health benefits (versus harms) of continuing mammography screening beyond that upper age limit of 74 years. Despite this, data indicate that screening women into their late 70s, 80s and 90s is occurring in practice [5]. In the US, where there is no stipulated upper age limit for breast screening, there are no clear recommendations as to whether to continue or stop mammography screening beyond 75 years of age [6,7,8]. The European Commission’s recommendations are for biennial screening in women aged 50–69 years and a suggestion of every 2–3 year screening up to age 74 years [9].

Australia actively recruits women aged 50–74 years for two-yearly (biennial) mammography screening for breast cancer. Prior to 2013–14, the target age-group included women aged 50–69, however this was extended to invite women aged 70–74, based on a recommendation in a programme evaluation report [10]. This report also recommended that women aged 75 years and over should no longer be eligible to attend the programme given limited evidence of health benefits. Nonetheless, currently, women aged 75 years and older can self-select to attend mammography screening. In 2016–2017, 14.4% of women aged 75–79 years, 5.4% of women aged 80-84 years and 1.2% of women aged 85 years and older underwent a screening mammogram despite not being actively invited to screening [11].

Globally, there is no consensus or uniform policy on whether mammography screening should be ceased or even discouraged in older women (and if so what upper age limit should be set) [12]. Importantly, there is little evidence on the balance between potential benefits and harms in older women (specifically 75+) in whom competing causes of death and co-morbidities could render routine screening relatively harmful, and a shorter life expectancy could reduce the likelihood of experiencing benefit from screening. To support population-level, as well as individual, decisions about the age or age-range to stop (or recommend against) screening, we conducted a systematic review of the evidence on the outcomes of mammography screening in older women.

The aim of this study was to systematically review and synthesise the evidence on the outcomes of mammography screening in women aged 75 years and older, to guide screening recommendations.

Methods

We report our methods and results in line with the Preferred Reporting Standards for Systematic Reviews and Meta-analyses (PRISMA) and provide a completed PRISMA checklist (Supplementary Appendix 1).

Search strategy

Our search strategy was developed based on a Cochrane systematic review of mammography screening (2013), with limits in place regarding age and publication date (Appendix 1). We searched three major databases (Ovid Medline, Embase and Cinahl), as well as hand searching all identified systematic reviews of breast cancer screening. We performed forward and backward citation tracking of identified relevant articles and contacted experts in the field for additional studies not located as part of the comprehensive search.

Searches were carried out in the specified databases for publications from 1990 to July 2022, with no language or other restrictions.

Selection criteria

Studies that reported relevant outcomes data for women aged 75+ years undergoing mammography screening using any method (i.e. film, digital, tomosynthesis) in a comparative context (relative to another group; continuing beyond 74 vs stopping at younger age; or screening beyond 74 vs not screened) or a non-comparative design were eligible for inclusion in this review (case reports and case series were excluded). All studies that included women aged 75 years and over, were included, however some were later excluded at data extraction stage if data was not stratified to enable extraction for this age group.

Relevant outcomes included both the benefits (or surrogate outcomes from which benefits could be inferred) and harms of mammography screening. Such outcomes included: breast cancer mortality; all-cause mortality; incidence of advanced breast cancer (or breast cancer stage distributions); prognostic characteristics of screen-detected breast cancers; evidence on treatment patterns (including data on treatment-related morbidity, such as physical adverse effects of treatment, quality of life measures); false-positive mammography; overdiagnosis; overtreatment; anxiety or adverse impact on quality of life; false positive biopsy or surgery for benign findings, false-negative findings (reported as interval cancer rates).

Study selection

Titles and abstracts were screened by one of three investigators (EM, NN, JH) and a research assistant for possible inclusion. Papers were excluded based on title/abstract if it was apparent that any of the inclusion criteria were not met. A random sample of 10% of titles and abstracts were double screened to ensure high levels of agreement. Any article which the investigator was unsure about was included in the list of full text articles to review in the second stage.

Full text versions of articles selected in the screening stage were reviewed independently by two of the four investigators confirm eligibility for inclusion. Disagreements at the full text stage were resolved by consensus.

Data extraction

Data was extracted by two investigators independently (EM, NN, TL). After completion of data extraction, the two authors reviewed both sets of extracted data and checked for errors and disagreements. Any disagreements were resolved by consensus.

For studies with data not specifically separated into 75 years and older, email contact was attempted with the author requesting additional (age-group specific) data if available (up to two attempts).

Data synthesis

Data from all included studies was extracted and synthesised through tabulation and a narrative synthesis was undertaken.

Quality appraisal

Study quality was appraised by one investigator (EM) using the Risk of Bias in Non-randomised studies of interventions (ROBINS-I) tool [13] or the Quality of primary diagnostic accuracy studies (QUADAS-2) tool [14] with specific consideration of screening specific biases such as lead time and length time bias for observational studies. We also adapted the Risk of Bias criteria used by Carter et al. [15] for modelling studies (using two criteria: transparent assumptions; data validation). Other investigators (NH, JH) were consulted if there were any areas of uncertainty regarding the assessment of the risk of bias for included studies (see also Table 1).

Table 1 Summary of included studies.

Study registration

This review was prospectively registered with PROSPERO (CRD42020203131).

Results

The search strategy yielded 3114 unique titles (Fig. 1). We excluded 2932 records after titles or abstract screening, leaving 182 papers for full-text screen. After full-text screen we excluded 116 publications that did not meet the inclusion criteria. Sixty-six papers were deemed potentially eligible for inclusion. An additional 35 were excluded because data was not stratified for 75 years and older to enable data extraction. Searches of reference lists and additional sources identified 5 additional papers, resulting in a total of 36 studies included in this review [4, 16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50].

Fig. 1
figure 1

Study identification and selection.

Table 1 summarises study characteristics. All included studies contained either observational (n = 27) or modelled/simulated data (n = 9). No RCTs which evaluated mammography screening in women 75 years and older were identified. As such, all included observational studies were subject to potential selection bias, confounding and lead time bias, with the vast majority of observational studies deemed at critical or serious risk of bias and only two studies at moderate risk of bias. Of the modelling studies, most did not report the assumptions made within the model, and a few failed to describe the validation process (Supplementary Appendix 2). As such, two were deemed low risk of bias, six at moderate risk, and one at high risk of bias.

Of the 27 observational studies, 18 studies followed up women after screening and/or diagnosis; [4, 16, 21, 23, 26, 32, 33, 35,36,37, 40,41,42,43, 45, 48,49,50] follow-up times in these studies ranged from 4 months to 20 years. Three studies did not include any comparison group and provided only descriptive statistics [22, 24, 27] and three studies compared groups by detection methods in those with breast cancer [32, 33, 36]. One study had multiple comparisons (by age and screening history) [35]; the remaining studies compared mammography screening outcomes in women aged 75 years and older with screening outcomes in women: of other ages (ten studies) [4, 16, 18, 20, 21, 23, 25, 38, 43, 48]; differing screening histories (three studies) [44,45,46]; differing screening intervals (three studies) [37, 40, 41] or historical unscreened women (one study) [49]; women who did not screen (one study) [42] and women who stopped screening (two studies) [26, 50]. (Table 1).

Of the nine modelling studies included, comparisons and screening histories varied: no comparisons (two studies) [28, 29], comparison of different screening recommendations (one study) [17], comparing with women who have stopped screening (four studies) [30, 34, 39, 47], and comparing screening women of different ages (two studies) [19, 31]. (Table 1).

Outcomes reported in each study varied widely. We have therefore classified outcomes across studies into five groups: measures of health benefits (Table 2); measures of screening harms (Table 3); screening detection measures (Table 4); cancer characteristics (Table 5); and treatment patterns (Table 6). Not all studies presented outcomes in each of these categories.

Table 2 Measures of health benefit.
Table 3 Screening harms.
Table 4 Screening detection measuresa.
Table 5 Cancer characteristics.
Table 6 Treatment patterns.

Measures of health benefit

The health benefits of screening in women over 75 years were reported using heterogeneous outcome measures that included breast cancer mortality, all cause/other cause mortality, survival, life years gained, and measures of quality-adjusted life years and cost effectiveness (Table 2).

Of the 14 studies that reported breast cancer mortality, two studies showed a significant decrease in breast cancer mortality among women who had screen-detected breast cancer compared to women who had their cancer detected by other means including those detected clinically; (HR: 0.50 (0.31–0.82) p < 0.001, and HR: 0.38 (0.24–0.61), p < 0.001) [32, 33]; Simon (2013 and 2014) [40, 41] also demonstrated an increased hazard ratio (HR) as the mammography screen interval increased in women who screen (2–5 years screen interval HR: 1.87 (1.10–3.19), 5+ years screen interval or no mammography HR: 3.17 (1.68–5.96); and 2–5 years screen interval HR: 1.62 (1.03–2.54), 5+ years screen interval or none HR: 2.80 (1.57–5.00). McCarthy [35] showed an increased HR in non-users compared to regular users of mammography screening (HR 75–84 yrs: 2.47 (1.70–3.58)). Schousboe [39] demonstrated a decrease in the number of breast cancer deaths by continuing screening beyond the age of 75, with various estimates stratified by Charlson Comorbidity score (CCS) and age for continuing screening (CCS0: Compared to stopping at 75 yrs, deaths averted per 1000 screens, 80 y: 1.7 (1.2–2.2); 85 y: 2.8 (2.0–3.6); 90 y: 3.5 (2.5–4.4)) .

In contrast, four studies [23, 37, 44, 45] showed no significant differences in breast cancer mortality between screen-detected cancers in women aged 75 years and older and their comparator, comprising younger women (66–74 vs 75–84 years: 0.24% (0.21–0.27%) vs 0.29% (0.25–0.34%)) [23], those who did not attend their last screen (RR: 2.87 (95%CI 0.62–13.2)) [44], those who had not participated (Rate ratio: 1.05 (0.27–4.14)) [45], and those who had never/less frequently attended screening (HR: 0.67 (0.31–1.44)) [37]. One study showed similar estimates for breast cancer deaths averted for those screened annually from 40 to 75 years vs those screened annually from 40 to 85 years (6.9 v 7.2 per 1000 women) [30].

Only two cohort studies compared the risk of breast cancer death between women who stopped screening at 75 years and those who continued [26, 50], both of which reported a non-significant difference in breast cancer mortality. Garcia-Albeniz (2020) reported the 8-year risk of breast cancer death of 0.07 per 1000 women [26] and Richman reported a breast cancer mortality hazard ratio of 0.87 (95% CI 0.55–1.37) [50]. Two microsimulation modelling studies [34, 47] however indicated benefits (in terms of life-years gained (LYG)) for continuing screening into older age (difference in LYG per 1000 women—stopping at 79 vs 69: 23.5 [34], 4.8–7.8LYG per 1000 screens (screening a woman at age 80) [47], as well as a reduction in breast cancer mortality (7% reduction in BC mortality) [34].

All four studies that reported survival showed that screening increased survival (Table 2) [32, 35, 36, 49].

Demb et al. [23] showed that across age-groups (66–74 vs 75–84 vs 85–94) of women who had at least one screen, breast cancer mortality modestly increased (0.24% (0.21–0.27%) vs 0.29% (0.25–0.34%) vs 0.31% (0.21–0.43%), whereas mortality from other causes substantially increased (14.5% (14.3–14.8%) vs 35.7% (35.3–36.1%) vs 65.4% (64.3–66.5%)). Additional outcomes data, including by co-morbidity where reported, are shown in Table 2.

Measures of screening harms

Measures that were considered screening harms included rates of false positives and recalls, biopsy, and overdiagnosis (Table 3).

In the seven studies that reported on false positives and recalls [17, 18, 21, 30, 31, 39, 43], the proportions of false positive tests and recalls for women aged 75 years and over who attended screening and the comparator group were generally similar, with the exception of two studies showing a significant additional number of false positive screens [39] and false-positive (benign) biopsies [17] associated with continuing screening beyond 75, compared to stopping at the age of 75.

Overdiagnosis was estimated in seven studies (two of which provided multiple comparisons) [19, 30, 31, 39, 47, 49, 50], all of which reported an increase in overdiagnosis from screening older age-groups. However, the estimated magnitude (and defined measure) of increase in overdiagnosed cancers varied substantially across studies, ranging from 0.5 to 0.6 per 1000 screens for women aged 76 years in average health (G-E model) [31] to 47% of breast cancer cases among screened women aged 75–84 years old being overdiagnosed [50].

Screening detection measures

Screening detection measures included cancer detection rates, invasive cancer detection rates, DCIS rates, interval cancer rates and positive predictive value (PPV) (Table 4); where reported, we also considered cancer incidence rates (Table 4).

Of the six studies that reported cancer detection or diagnosis rates [22, 24, 27, 42, 43, 50], these rates ranged from 4.85 to 9.4 per 1000 screens in women aged 75+. Only three studies provided a comparator group; Upneja [43] reported a total breast cancer diagnosis rate of 9.4 per 1000 for women 75 and older compared to 7.3 per 1000 women aged 67–74; Smith-Bindman [42] compared screening to no screening and provided a relative risk (RR) of breast cancer of 3.6 (95% CI: 3.3–4.0); and Richman [50] compared women aged 75–84 years who continue screening to women who do not, and reported a cumulative incidence of breast cancer of 4.85 per 100 in those who attend screening compared to 2.56 per 100 in those who do not attend further screens.

Five of the eight studies that reported on the detection rate of invasive cancers provided an age-group comparison. The rates were similar or slightly higher in the older group (compared to 66–74 or 70–74) [4, 18, 21, 25], whilst one study showed a slight decrease in 10-year cumulative incidence of invasive breast cancer among older women [23]. One study compared women who continue screening with women who stop screening [50] and demonstrated a significant increase in localised invasive breast cancer incidence in those who continue screening (RD 1.65 (95%CI 1.21–2.03)).

Of the seven studies that reported on DCIS, four showed either no change or a decrease in DCIS rates as women aged (1.6 v 2.0 per 1000 screens [4], 2.6 vs 2.6 per 1000 [18, 21], 10 year cumulative incidence 1% v 0.7% [23]) and two showed an increase (rate per 1000 women screened 70–74years: 0.87 vs 75+ years: 0.97 [25], RR compared to no screening 4.9 (3.5–6.9)) [42]. Richman [50] showed an increased risk in DCIS detection in women who continue screening compared to women who stop screening (RD 0.64 (95% CI 0.46-0.79)).

Breast cancer characteristics

Cancer characteristics that were presented included proportion of cancers that were node positive; stage distribution of the cancers detected; and tumour size (Table 5).

In the three studies that compared younger and older women, older women were less likely to have an advanced stage cancer detected [21, 25, 38] however the comparators included some age-groups as young as 40-49 years. These findings are consistent with studies reporting smaller tumour sizes as women age [25, 38], and with regular repeat screening interval [46].

Treatment patterns

Treatment patterns were reported by three studies [24, 26, 27] (Table 6) however only one study included a comparator [26]. Almost all women 75 years and older with screen-detected cancers received some form of treatment. In the one study with a comparison there was evidence that continuing to screen was associated with a higher proportion of less radical surgical treatment, (radical mastectomy: continuing screening vs stopping screening: 14.2% v 17.0%) [26].

Discussion

The studies included in our systematic review used heterogeneous methods to assess and report on a range of outcomes for mammography screening in older women. Given this heterogeneity, we have summarised study-specific findings in evidence tables (since pooling of data would not be appropriate), noting there was mixed evidence about the benefits of continuing mammography screening beyond the age of 75 years. The few studies that reported on breast cancer mortality as outcome gave contrasting (mixed) results: about equal numbers of studies showed a beneficial effect [32,33,34,35, 39,40,41] or no effect on mortality [23, 26, 30, 37, 44, 45, 50] and/or used un-informative comparisons.

Although fewer studies reported on the harms of screening beyond the age of 75, evidence presented on various harms (false positives, recalls, biopsy, and overdiagnosis) were generally more consistent. Specifically, there was consistent evidence that screening into older age increases overdiagnosis [19, 30, 31, 39, 47, 49, 50] which can be partially explained due to the shorter follow-up time possible with older women, and higher competing causes of mortality [51].

The evidence reported in this review should be interpreted factoring in the various limitations we identified. Many studies used comparisons that were not informative about the health impact of screening into older age or used comparisons that could bias towards an effect from screening: the ideal comparison to assess the impact of screening beyond the age of 75 would be to compare those who continue screening with those who stop screening at the age of 75. Only two observational studies undertook this comparison [26, 50], although some modelling studies simulated this scenario [30, 34, 39, 47]. The results of these studies are summarised in Appendix 2. When considering only these studies three of the modelling studies indicated a benefit, whereas both observational studies and one modelling study did not, and all six studies reported harms. One study showed similar estimates in breast cancer deaths averted [30], two showed no difference in risk of breast cancer death [26, 50], two indicated benefit in terms of life years gained [34, 47] and two estimated a reduction in breast cancer mortality [34, 39]. We also see an additional number of false positive screens [39], an increase in the false positive biopsy rate [39], an increase in the incidence of breast cancer [50] (including increase in both invasive cancer detection rates [50] and DCIS rates [50])and an increase in overdiagnosis [30, 39, 47, 50].

The other comparisons made by included studies do not provide direct evidence on the health benefits and harms of continuing to screen beyond age 75. Studies that compare older women to younger age groups, compare according to screening history or screening interval, or compare by whether a cancer is screen-detected or diagnosed clinically (by physician or patient) are prone to lead time bias and as a result may be inherently biased towards screening, and do not tell us how health outcomes change if the woman chooses to stop vs continue screening beyond age 75. For example, a study with a more informative comparison (stop vs continue screening) [26] indicates a non-significant breast cancer death hazard ratio (1.00; 95% CI 0.83–1.19), whereas a study with a less informative comparison (within age-group clinical detection vs screen-detected) [32] suggests a significant benefit to screen detection (HR 0.5; 95% CI 0.31–0.82). As such, the results of many of the studies with less informative comparisons need to be interpreted with caution.

Several studies reported screening detection measures, showing similar or higher cancer detection rates (depending on the comparison used) or PPV for recall (detection yield) for screening older women [4, 18, 21, 26, 43] although several studies did not have a comparator [22, 24, 27]. These metrics provide information about the performance of the screening process, but they provide less knowledge about the health benefit of continuing to screen. This comparison is particularly problematic when being made by age-groups as the detection metrics generally differ between younger and older women. For example, mammograms are more sensitive in older women, and underlying cancer rates higher, so more cancer detection would be expected at screening in older women (compared to younger groups) but this is not equivalent to evidence on whether screening beyond 75 years and older as opposed to stopping confers a mortality benefit.

Likewise, comparing cancer characteristics and treatment patterns between younger and older women is un-informative about screening effects because breast cancer biology and treatment are known to differ between younger and older women, so in extreme age, i.e. 75 years and older, these differences would be expected and could be more evident. Therefore reported differences related to cancer characteristics between age-groups do not provide direct evidence about health benefit of screening into older age.

It is important to note that the potential benefits of screening do change as women age. The sensitivity of screening increases, but so too does the competing risk of death from other causes. Screening older women might not be effective in terms of mortality reduction, even if mammography screening detects early-stage breast cancer well, if most deaths in those older than 75 are not from breast cancer. This is highlighted in the data from Demb [23], where the cumulative incidence of breast cancer deaths is around 0.3% and so screening older women will make only a small difference to this proportion. At the same time, cumulative incidence of other causes of death becomes very dominant and increases significantly as women age (66–74 years: 14.5% (95% CI: 14.3–14.8%); 75–84 years: 35.7% (95% CI:35.3–36.1%); 85 years +: 65.4% (95% CI: 64.3–66.5%), also increasing with higher co-morbidities [23]. Considering co-morbidities, one study highlighted that the estimated breast cancer mortality reduction from screening decreased with increasing age and with higher co-morbidity score [39], and other studies reported that incremental life-years gained for continuing screening diminished in those with more severe co-morbidity [31, 39]. On the other hand, one could argue that early-detection of breast cancer in this age-group may reduce treatment burden, and hence represents an important outcome (even if limited evidence on mortality reduction). Very few eligible studies reported treatment patterns and only one had a comparison, highlighting that those who continued to screen were more likely to receive conservative breast surgery than those who stopped screening beyond 75 years [26].

Quality appraisal showed all studies were prone to bias with most observational studies rated as serious risk of bias, and most modelling studies rated at a moderate risk of bias. However, caution should be taken when interpreting these findings because we applied established quality appraisal tools for observational studies [13, 14], and in the absence of a standard tool for appraising modelling studies we adapted criteria from Carter et al. [15]. As a result, it seems likely that more stringent criteria were applied to observational studies. This highlights the challenges in assessing the quality of modelling studies, especially with regards to the assumptions applied in models. For example, when modelling overdiagnosis, only three [31, 39, 47] of the five papers included assumptions allowing for DCIS that is non-progressive, and only two studies [39, 47] included non-progressive invasive cancer in their assumptions. Many of the modelling papers did not state this clearly in their methods, and further details were sought from cited earlier work. It is possible that the assumptions made in these studies may not be consistent with current understanding of the natural history of breast cancers with regards to non-progressive disease, which could bias estimates resulting in an under-estimation of overdiagnosis.

As with all studies evaluating the impact of screening programmes, lead time bias must be considered. The studies included in this review are no exception: with no RCTs available for inclusion, lead time bias will be evident in included studies. As such, the benefit shown in survival of all four studies cannot be taken as evidence of screening benefit, as their results will be affected by lead time bias [32, 35, 36, 49].

Given the limited quality and mixed evidence about the benefits of continuing mammography screening beyond the age of 75 years, older women should be presented with the opportunity to make an informed decision based on their values and an understanding of the lack of evidence in this area. Decision aids have been shown as effective in enabling older women to make more informed decisions regarding mammography screening [52].

Conclusion

Despite many studies having reported on outcomes of screening women aged 75 and older, findings from this systematic review highlight the limited evidence available from high quality studies to make a recommendation for or against continuing breast screening beyond the age of 75 years. Many of the comparisons used in published studies are not directly informative as far as benefit or harms associated with continuing to screen (as opposed to stopping) beyond 75. Further studies with more informative comparisons, specifically comparing continuing versus stopping screening at 75 years, are required before definitive recommendations can be made.