The breast cancer screening programmes in the United Kingdom currently invite women aged 50–70 years for screening mammography every 3 years. Since the time the screening programmes were established, there has been debate, at times sharply polarised, over the magnitude of their benefit and harm, and the balance between them. The expected major benefit is reduction in mortality from breast cancer. The major harm is overdiagnosis and its consequences; overdiagnosis refers to the detection of cancers on screening, which would not have become clinically apparent in the woman’s lifetime in the absence of screening.
Professor Sir Mike Richards, National Cancer Director, England, and Dr Harpal Kumar, Chief Executive Officer of Cancer Research UK, asked Professor Sir Michael Marmot to convene and chair an independent panel to review the evidence on benefits and harms of breast screening in the context of the UK breast screening programmes. The panel, authors of this report, reviewed the extensive literature and heard testimony from experts in the field who were the main contributors to the debate.
The nature of information communicated to the public, which too has sparked debate, was not part of the terms of reference of the panel, which are listed in Appendix 1.
1.2 Relative mortality benefit
The purpose of screening is to advance the time of diagnosis so that prognosis can be improved by earlier intervention. A consequence of earlier diagnosis is that it increases the apparent incidence of breast cancer in a screened population and extends the average time from diagnosis to death, even if screening were to confer no benefit. The appropriate measure of benefit, therefore, is reduction in mortality from breast cancer in women offered screening compared with women not offered screening.
In the panel’s judgement, the best evidence for the relative benefit of screening on mortality reduction comes from 11 randomised controlled trials (RCTs) of breast screening. Meta-analysis of these trials with 13 years of follow-up estimated a 20% reduction in breast cancer mortality in women invited for screening. The relative reduction in mortality will be higher for women actually attending screening, but by how much is difficult to say because women who do not attend are likely to have a different background risk. Three types of uncertainties surround this estimate of 20% reduction in breast cancer mortality. The first is statistical: the 95% confidence interval (CI) around the relative risk (RR) reduction of 20% was 11–27%. The second is bias: there are a number of potential sources of distortion in the trials that have been widely discussed in the literature ranging from suboptimal randomisation to problems in adjudicating cause of death. The third is the relevance of these old trials to the current screening programmes. The panel acknowledged these uncertainties, but concluded that a 20% reduction is still the most reasonable estimate of the effect of the current UK screening programmes on breast cancer mortality. Most other reviews of the RCTs have yielded similar estimates of relative benefit.
The RCTs were all conducted at least 20–30 years ago. More contemporary estimates of the benefit of breast cancer screening come from observational studies. The panel reviewed three types of observational studies. The first were ecological studies comparing areas, or time periods, when screening programmes were and were not in place. These have generated diverse findings, partly because of the major advances in treatment of breast cancer, which have a demonstrably larger influence on mortality trends than does screening, and partly because of the difficulty of excluding imbalances in other factors that could affect breast cancer mortality. The panel did not consider these studies helpful in estimating the effect of screening on mortality. The other two types of studies, case–control studies and incidence-based mortality studies, showed breast screening to confer a greater benefit than did the trials. Although these studies, in general, attempted to control for non-comparability of screened and unscreened women, the panel was concerned that residual bias could inflate the estimate of benefit. However, the panel notes that these studies’ findings are in the same direction as the trials.
1.3 Absolute mortality benefit
Estimates of absolute benefit of screening have varied from one breast cancer death avoided for 2000 women invited to screening to 1 avoided for about 100 women screened, about a 20-fold difference. Major determinants of that large variation are the age of women screened, and the durations of screening and follow-up. The age of the women invited is important, as mortality from breast cancer increases markedly with age. The panel therefore applied the relative mortality reduction of 20% to achieve the observed cumulative absolute risk of breast cancer mortality over the ages 55–79 years for women in the United Kingdom, assuming that women who began screening at 50 years would gain no benefit in the first 5 years, but that the mortality reduction would continue for 10 years after screening ended. This yielded the estimate that for every 235 women invited to screening, one breast cancer death would be prevented; correspondingly 180 women would need to be screened to prevent one breast cancer death. Uncertainties in the figure of a 20% RR reduction would carry through to these estimates of absolute mortality benefit. Nonetheless, the panel’s estimate of benefit is in the range of one breast cancer death prevented for ∼250 women invited, rather than the range of 1 in 2000.
The major harm of screening considered by the panel was that of overdiagnosis. Given the definition of an overdiagnosed cancer, either invasive or non-invasive, as one diagnosed by screening, which would not otherwise have come to attention in the woman’s lifetime, there is need for a long follow-up to assess the frequency of overdiagnosis. In the view of the panel, some cancers detected by screening will be overdiagnosed, but the uncertainty surrounding the extent of overdiagnosis is greater than that for the estimate of mortality benefit because there are few sources of reliable data. The issue for the UK screening programmes is the magnitude of overdiagnosis in women who have been in a screening programme from age 50 to 70, then followed for the rest of their lives. There are no data to answer this question directly. Any estimate will therefore be, at best, provisional.
Although the definition of an overdiagnosed case, and thus the numerator in a ratio, is clear, the choice of denominator has been the source of further variability in published estimates. Different studies have used: only the cancers found by screening; cancers found during the whole screening period, both screen-detected and interval; cancers diagnosed during the screening period and for the remainder of the women’s lifetime. The panel focused on two estimates: the first from a population perspective using as the denominator the number of breast cancers, both invasive and ductal carcinoma in situ (DCIS), diagnosed throughout the rest of a woman’s lifetime after the age that screening begins, and the second from the perspective of a woman invited to screening using the total number of breast cancers diagnosed during the screening period as the denominator.
The panel thought that the best evidence came from three RCTs that did not systematically screen the control group at the end of the screening period and followed these women for several more years. The frequency of overdiagnosis was of the order of 11% from a population perspective, and about 19% from the perspective of a woman invited to screening. Trials that included systematic screening of the control group at the end of the active part of the trial were not considered to provide informative estimates of the frequency of overdiagnosis.
Information from observational studies was also considered. One method that has been used is investigation of time trends in incidence rates of breast cancer for different age groups over the period that population screening was introduced. The published results of these studies varied greatly and have been interpreted as providing either reassurance or cause for alarm. So great was the variation in results that the panel conducted an exercise by varying the assumptions and statistical methods underlying these studies, using the same data sets; estimates of overdiagnosis rates were found to vary across the range of 0–36% of invasive breast cancers diagnosed during the screening period. The panel had no reason to favour one set of estimates over another, and concluded that this method could give no reliable estimate of the extent of overdiagnosis.
Were it possible to distinguish at screening those cancers that would not otherwise have come to attention from those that, untreated, would lead to death, the overdiagnosis problem could be much reduced, at least in terms of unnecessary worry and treatment. Currently this is not possible, so neither the woman nor her doctor can know whether a screen-detected cancer is an ‘overdiagnosed’ case or not. In particular, DCIS, most often diagnosed at screening, does not inevitably equate to overdiagnosis – screen-detected DCIS, after wide local excision (WLE) only, is associated with subsequent development of invasive breast cancer in 10% of women within 10 years.
The consequences of overdiagnosis matter, women are turned into patients unnecessarily, surgery and other forms of cancer treatment are undertaken, and quality of life and psychological well being are adversely affected.
1.5 The balance of benefit and harm
The panel estimates that an invitation to breast screening delivers about a 20% reduction in breast cancer mortality. For the UK screening programmes, this currently corresponds to about 1300 deaths from breast cancer being prevented each year, or equivalently about 22 000 years of life being saved. However, this benefit must be balanced against the harms of screening, especially the risk of overdiagnosis. In the panel’s view, overdiagnosed cancers certainly occur, but the frequency in a screening programme of 20 years duration is unknown. Estimates from trials of shorter duration suggest overdiagnosis of about 11% as a proportion of breast cancer incidence during the screening period and for the remainder of the woman’s lifetime, or equivalently about 19% as a proportion of cancers diagnosed during the screening period. Any excess mortality stemming from the investigation and treatment of breast cancer is considered by the panel to be small and considerably outweighed by the benefits of treatment. Some other harms, including increased anxiety and discomfort caused by screening, are also acknowledged.
Notionally, for 10 000 women invited to screening, from age 50 for 20 years, it is estimated that 681 cancers (invasive and DCIS) will be diagnosed, of which 129 will represent overdiagnosis (using the 19% estimate of overdiagnosis) and 43 deaths from breast cancer will be prevented.
Given that the treatment for breast cancer has improved, is screening no longer relevant? The panel’s view is that the benefits of screening and those of better treatments are reasonably considered independent. Uncertainty about possible interaction between the benefits of screening and of contemporary treatments is not a reason for stopping breast screening.
The panel was not asked to comment on costs, both of interventions and the consequences of overdiagnosis. With accurate figures an estimate of cost-benefit could be made and compared with other interventions, but would be a significant piece of work in its own right.
An individual woman cannot know whether she is one of the numbers who will benefit or be harmed from screening. If she chooses to be screened, it should be in the knowledge that she is accepting the chance of benefit, having her life extended, knowing that there is also a risk of overdiagnosis and unnecessary treatment. Similarly, a woman who declines the invitation to screening needs to recognise that she runs a slightly higher risk of dying from breast cancer.
1.6 Conclusions and recommendations
Breast screening extends lives. The panel’s review of the evidence on benefit – the older RCTs, and those more recent observational studies – points to a 20% reduction in mortality in women invited to screening. A great deal of uncertainty surrounds this estimate, but it represents the panel’s overview of the evidence. This corresponds to one breast cancer death averted for every 235 women invited to screening for 20 years, and one death averted for every 180 women who attend screening.
The panel’s best estimate is that the breast screening programmes in the United Kingdom, inviting women aged 50–70 every 3 years, prevent about 1300 breast cancer deaths a year, a most welcome benefit to women and to the public health.
However, there is a cost to women’s well being. In addition to extending some lives by early detection and treatment, mammographic screening detects cancers, proven to be cancers by pathological testing, that would not have come to clinical attention in the woman’s life, were it not for screening - called overdiagnosis. The consequence of overdiagnosis is that women have their cancer treated by surgery, radiotherapy and medication, but neither the woman nor her doctor can know whether this particular cancer would be one that could possibly lead to death, or one that would have remained undetected for the rest of the woman’s life.
The panel sought to estimate the level of overdiagnosis in women screened for 20 years and followed to the end of their lives. Estimates of overdiagnosis abound, from near to zero to 50%, but there is a paucity of reliable data to answer this question. There has not even been agreement on how to measure overdiagnosis. On the basis of follow-up of three RCTs, the panel estimated that in women invited to screening, about 11% of the cancers diagnosed in their lifetime constitute overdiagnosis, and about 19% of the cancers diagnosed during the period that women are actually in the screening programme; but the panel emphasises these figures are the best estimates from a paucity of reliable data.
Putting together benefit and overdiagnosis from the above figures, the panel estimates that for 10 000 UK women invited to screening from age 50 for 20 years, about 681 cancers will be found of which 129 will represent overdiagnosis, and 43 deaths from breast cancer will be prevented. In round terms, therefore, for each breast cancer death prevented, about three overdiagnosed cases will be identified and treated. Of the ∼307 000 women aged 50–52 who are invited to screening each year, just >1% would have an overdiagnosed cancer during the next 20 years. Given the uncertainties around the estimates, the figures quoted give a spurious impression of accuracy.
The panel concludes that the UK breast screening programmes confer significant benefit and should continue. The greater the proportion of women who accept the invitation to be screened, the greater is the benefit to the public health in terms of reduction in mortality from breast cancer. However for each woman the choice is clear: on the plus side screening confers a likely reduction in mortality from breast cancer because of early detection and treatment. On the negative side, is the knowledge that she has perhaps a 1% chance of having a cancer diagnosed, and treated with surgery and other modalities, which would never have caused problems had she not been screened.
Evidence from a focus group conducted by Cancer Research UK and attended by two panel members, and in line with previous similar studies, was that this was an offer many women will feel is worth accepting: the treatment of overdiagnosed cancer may cause suffering and anxiety, but that suffering is worth the gain from the potential reduction in breast cancer mortality. Clear communication of these harms and benefits to women is of utmost importance and goes to the heart of how a modern health system should function. There is a body of knowledge on how women want information presented, and this should inform the design of information to the public.
2.1 The UK NHS breast screening programmes
The NHS breast cancer screening programme in England began inviting women to be screened in 1988. This followed the recommendations made by Professor Sir Patrick Forrest in his report on breast screening in 1986 (Forrest, 1986). The breast screening programmes in the United Kingdom currently invite women aged 50–70 years for a screening mammography every 3 years. The mammography is designed to detect changes in the breast tissue that may indicate the presence of cancer. The screening programme in England is currently conducting a randomised trial to ascertain whether there would be benefit in extending the age at which women are invited to 47–73 years.
2.2 Principles of screening
Screening is concerned with the detection of disease at an early stage, with the expectation that treatment will be more effective if begun earlier in the disease process. Screening is therefore based on the principle of there being an effective treatment. It is well recognised that an apparent benefit of increased survival time could be illusory because of simply bringing forward the time of diagnosis without changing the course of the disease. Therefore, the appropriate way to assess benefit is to look at breast cancer mortality of screened and unscreened cohorts rather than just survival time from diagnosis (see section 3).
As the principle of screening is to diagnose cases earlier, at any particular time point during the period of successive screenings, there will be more cases of breast cancer in a group of screened women compared with a similar group of unscreened women. However, it is possible that some of these additional cases may be cancers that would not otherwise have been diagnosed or caused the woman any problem during her lifetime. These cancers are referred to as overdiagnosis (see section 4).
2.3 The debate over benefits and harms of breast screening
Since the screening programmes were established, there has been debate over the potential benefits and harms. Recently, the debate has focussed on the reduction in mortality attributable to screening, the numbers of women overdiagnosed, and the way that the risks and benefits are communicated to women invited for screening. The arguments have become quite polarised between those who believe that the benefit of decreased breast cancer mortality outweighs the harms and those who believe the harms outweigh the benefit. These differing views of the evidence have arisen, in part, from disagreements over the validity and applicability of the available RCTs of breast screening, and from questions about the usefulness and interpretation of observational data on breast cancer incidence and mortality.
The debate over the benefits and harms of breast screening is not unique to the UK and the NHS breast screening programmes. In 2002, the International Agency for Research on Cancer at the World Health Organisation reviewed the evidence on breast screening, and put forward recommendations on further research and on implementing screening programmes (IARC, 2002). The US Preventive Services Task Force in 2009 re-examined the efficacy of different screening modalities. They recommended that women under the age of 50 not be routinely screened, and that women aged 50–74 have biennial rather than annual screens (Woolf, 2010). The Canadian Taskforce on Preventative Health Care updated their guidelines on breast screening in 2011, and concluded that the reduction in mortality associated with screening mammography is small for women aged 40–74 years at average risk of breast cancer. They also found a greater reduction in mortality for women aged 50 compared with those <50, and that harms of overdiagnosis and unnecessary biopsy may be greater for younger women than for older women. They recommended that women aged 50–74 be routinely screened but state that appreciable uncertainty exists around the evidence for this (Canadian Task Force on Preventive Health Care, 2011). Published reports from the Nordic Cochrane Centre concluded that, despite their substantial methodological limitations, the trials of screening showed that screening saved lives, but at the cost of considerable harm from overdiagnosis (Gøtzsche and Nielsen, 2011).
2.4 Breast cancer in the UK
Incidence and mortality
In the United Kingdom, breast cancer remains the most commonly diagnosed cancer in women (48 417 cases in 2009) and is the second most common cause of death from cancer in women (11 556 deaths in 2010). UK breast cancer incidence rates have been rising in all age groups since the late 1970s (Figure 1A). The causes of these increasing rates are thought to include: increased use of hormone replacement therapy; later age at child birth; lower parity; and increasing obesity and alcohol intake in women. Also, there is believed to be better ascertainment, especially in older women. In common with most countries, the introduction of the screening programme for women aged 50–64 in 1988 and those aged 65–70 in 2001 led to additional increases in incidence (Figure 1A).
By contrast with incidence rates, since the early 1990s, the mortality rates for breast cancer have been decreasing – shown both as annual mortality rates and 35-year cumulative risk of dying from breast cancer (Figure 1B). It is believed that the causes of these decreases may include: improvement in treatment, in particular adjuvant therapies; specialisation and better organisation of cancer care; screening; and increased breast awareness (Appendix 2).
Contribution of screening to decreased breast cancer mortality
It is widely agreed that screening alone cannot be the major factor responsible for the decrease in breast cancer mortality over the last 20 years. Improvements in treatment and service delivery are likely to have made the largest contribution to decreased mortality (Berry et al, 2005). Indeed, without effective treatment, screening for breast cancer is redundant. However, it is important to establish what contribution, if any, screening makes, given that it requires the use of substantial resources within the health system, and nearly two million women each year in England alone accept the invitation and agree to be screened (The NHS Information Centre, 2012).
2.5 Independent review of breast screening
It is within this context that Professor Sir Mike Richards, National Cancer Director, England and Dr Harpal Kumar, Chief Executive Officer of Cancer Research UK, asked Professor Sir Michael Marmot to chair an independent panel to review breast screening. The panel’s terms of reference are shown in Appendix 1. This panel has reviewed the extensive literature and heard testimony from many of the experts in the field. This report details its findings and recommendations for the breast screening programme in England.
2.6 Independent review panel membership
The independent panel consisted of nationally and internationally recognised experts in epidemiology and/or medical statistics, as well as in current breast cancer diagnosis and treatment practices. A patient advocate was an integral member of the panel. No panel member had previously published on breast screening, thus helping to ensure an objective and independent assessment of the evidence.
The panel was chaired by Professor Sir Michael G Marmot, Director of the Institute of Health Equity, University College London; Chair, WHO Commission on Social Determinants of Health; Chair, Marmot Review – Strategic Review of Health Inequalities in England after 2010; Chair, European Review on the Social Determinants of Health and the Health Divide; MRC Research Professor of Epidemiology and Public Health, University College London with long-standing research on social determinants of health and health inequalities.
The other panellists were:
Professor Douglas G Altman, Director of the Centre for Statistics in Medicine and Cancer Research UK Medical Statistics Group, University of Oxford. Doug’s varied research interests include the use and abuse of statistics in medical research, studies of prognosis, regression modelling, systematic reviews, randomised trials, and studies of medical measurement. He is actively involved in efforts to improve the quality of scientific publications by promoting transparent and accurate reporting of health research.
Professor David A Cameron, Clinical Director of the Edinburgh Cancer Research Centre, Director of Cancer Services at NHS Lothian, and Professor of Oncology at Edinburgh University. Previously, David was the Director of the NIHR National Cancer Research Network and Professor of Oncology at Leeds University. His research interests are in translational and clinical trials in breast cancer, and he is the principal investigator of several clinical trials looking at treatment of early breast cancer. Before qualifying as a medical doctor, he completed an undergraduate degree in Mathematics.
Professor John A Dewar, Consultant and honorary Professor of Clinical Oncology. Until recently, John was Head of Oncology at Ninewells Hospital, Dundee. John has a long-standing interest in the management of patients with breast cancer and has been closely involved in clinical trials of both radiotherapy and systemic therapy for breast cancer.
Professor Simon G Thompson, Director of Research in Biostatistics at the University of Cambridge. Simon’s research interests are in meta-analysis and evidence synthesis, clinical trial methodology, health economic evaluation, and cardiovascular epidemiology. He has collaborated on a number of major clinical trials, recently including all the major UK national trials of screening and treatment for abdominal aortic aneurysms.
Maggie Wilcox, patient advocate. Maggie was a health visitor for many years before working as Clinical Nurse Specialist in palliative care before her breast cancer diagnosis in 1997. After early retirement following her treatment, she became involved in patient advocacy in cancer services and research. She now provides a patient voice at national and local level as a member of various organisations, including the National Cancer Research Institute Breast Clinical Study Group and the Surrey, West Sussex and Hampshire Network Breast Site Specific Group.
2.7 Independent review process and role of secretariat
As set out in the review’s terms of reference, the secretariat provided initial key literature on breast cancer screening, including publications recommended from both sides of the debate. The panel then called on a range of experts (see Appendix 1 for full list) to give evidence.
Cancer Research UK and the Department of Health provided the secretariat function for the review comprising:
Dr Dulcie McBride, Consultant in Public Health Medicine, Department of Health
Sara Hiom, Director of Information, Cancer Research UK
Nick Ormiston-Smith, Data Analysis and Research Manager, Cancer Research UK
Dr Martine Bomb, Programme Manager, Cancer Research UK
Samantha Harrison, Programme Officer, Cancer Research UK
The secretariat acted purely as support to the panel in the practical, writing, and dissemination functions and having no say in the conclusions or recommendations. Further information can be found in Appendix 1.
3. The effect of breast screening on mortality
This section summarises the panel’s views of the effect of breast screening on mortality. Specifically, the aim is to estimate the effect of the current national screening programmes in the United Kingdom on breast cancer mortality. Estimates of relative risk reduction, absolute risk reduction, and increase in life expectancy are discussed.
Randomised controlled trials potentially provide the most reliable information about the effects of breast screening. Well-conducted RCTs are prone to fewer distorting effects, or biases, than observational studies. Systematic reviews and meta-analyses of RCTs are widely accepted as the highest level of evidence for guiding policy decisions on medical interventions. For this reason, our quantitative estimate of the benefits of breast screening comes from the randomised trials of breast screening. Given the wealth of observational studies on this issue, in section 3.6 we look to observational studies as a possible guide to more contemporary estimates of the effects of screening on mortality.
Randomised controlled trials, however, are not without their problems in practice. Lack of internal validity, for example, through failures in proper randomisation, losses to follow-up and misclassification of end points, can lead to biased estimates of effects. Differences between the trials and the current UK context, for example, in the type of screening undertaken or in the length of follow-up, lead to a lack of external validity. Both the internal and external validity of the RCTs of breast screening have been widely discussed.
A specific issue raised by some commentators is that most of the randomised trials of breast screening date from the 1980s or earlier. Treatment and overall management of breast cancer have improved considerably since that time. Are the trials still relevant? Such a question can be asked of any area of medical investigation and treatment; trials refer to the past and our use of interventions relates to the future. It is an important area of judgement and one that the panel kept at the forefront of its consideration.
The purpose of screening is to prolong survival, but length of survival from diagnosis of breast cancer to death cannot be used as an end point in the RCTs, because the cancers diagnosed by screening are diagnosed earlier than those diagnosed without screening. Thus, even in the absence of any therapy, a cancer diagnosed earlier by screening will have a better survival than the same cancer presenting later symptomatically. Mortality after invitation to screening is the appropriate end point. However, concerns have been raised about the use of breast cancer mortality. If the adjudication of a death as due to breast cancer is influenced by the woman’s screening history, then the estimate of the effects on breast cancer mortality can become biased. For this reason, some have argued that death from all cancers, or indeed all-cause mortality, should be the primary outcome of interest in the trials. The panel disagrees with this view (section 3.5). We also comment on the estimation of absolute risk differences, as opposed to RRs, and the difference between the effects expressed per woman invited and per woman screened.
The panel’s view is that although the trials are far from perfect, they offer the most reliable evidence on the RR reduction in breast cancer mortality to be derived from screening.
3.2 Available randomised trials
Eleven randomised trials have been undertaken and reported (New York health insurance plan (HIP), Malmö I and II, Swedish Two County (Kopparberg and Östergötland), Canada I and II, Stockholm, Göteborg, UK Age trial, and Edinburgh; Table 1). The three trials with two parts have sometimes but not always been reported separately in publications. Three other randomised trials are mentioned in the Cochrane Review (Gøtzsche and Nielsen, 2011), but were excluded because they compared multiple interventions (not just mammography), or made major post-randomisation exclusions. We also exclude these three studies from our assessment.
All the trials compared women invited to screening with a control group not invited. However, they varied considerably, for example, in terms of the method of randomisation, age group of women invited, type of mammography employed, whether physical examination or self-examination was also used in either the invited or control groups, interval between screens, number of screens, length of follow-up, and system used for adjudicating breast cancer deaths (Table 1).
The invited and control groups in the trials were constructed either by randomising individuals, or by randomising clusters (geographical areas or general practices), or by allocation according to day of birth. Individual randomisation, with adequate allocation concealment, is rightly regarded as the most reliable method. For population screening studies, however, cluster randomisation can also be adequate, provided sufficient clusters are randomised and balance in social and other characteristics is achieved. Women are identified through existing registers, and so it is unlikely that participation bias, which afflicts some cluster trials (Puffer et al, 2003), would apply (for example, through women moving between areas in order to avoid or obtain an invitation to breast screening). Similarly, using allocation by day of birth would seem to be adequate for population screening trials. Of the trials considered, the Edinburgh trial suffered the most problems in terms of its cluster randomisation (Gøtzsche and Nielsen, 2011), with some re-allocations and post-randomisation exclusions of clusters, which led to severe baseline imbalances (26% of women in the control group and 53% in the invited group were in the highest socioeconomic group). For this reason, like the Cochrane Review, we exclude the Edinburgh trial from our main summary and comment on its results separately (section 3.5).
The trials recruited women of different ages (Table 1). Most overlapped extensively with the age group 50–70 years, relevant to the UK programmes, but some (e.g. UK Age trial, Malmö II) did not. We base our primary conclusions about RR on all the trials, as this appears fairly constant across age groups (Nyström et al, 2002). There is some evidence, however, that the RR may be attenuated in women under age 50 (Canadian Task Force on Preventive Health Care, 2011), so we also consider an analysis that excludes these women.
Duration of follow-up
Even in the pre-screening era, the median survival from diagnosis of breast cancer was several years, so any benefits of screening in terms of mortality are not immediate, but will accrue over time. So the best evidence would come from a trial with a long duration of follow-up, comparing the invited group with a control group who are never invited to screening. The data that come nearest to this are for the age group 55–69 in Malmö I, with a follow-up of 19 years. Most of the trials, however, started systematic screening of the control group after 4–10 years. Little effect on mortality is seen within the first 5 years of screening, so we regard a follow-up period of about 10–15 years after randomisation as providing the most reliable estimate of the RR. A shorter follow-up time would put too much weight on the early period after initial screening, whereas a longer period would include a greater diluting effect of screening in the control group. So we base our primary conclusions about breast cancer mortality on the data reported in the Cochrane Review, which provided results for 13 years of follow-up of the groups as randomised (Gøtzsche and Nielsen, 2011).
Adjudicating cause of death
Potential biases from classifying cause of death have been a major source of contention, especially in the Swedish trials. Ascribing a death as primarily due to breast cancer, or not due to breast cancer, is not always easy or reliable. So, when the screening history of a woman is known, or when a prior diagnosis of breast cancer has been made, this could influence the adjudicated cause of death. There are two ways in which this could distort the results of the trials. The first is overt bias, in which investigators closely involved with the trial adjudicate cause of death and tend to avoid ascribing the cause of death as breast cancer when the woman has been screened (and conversely if they had not). This would exaggerate any beneficial effect of screening. This bias (which may be subconscious) is avoided by the use of an independent end point committee to ascribe causes of death, or by the use of death certificates from national registries. These methods however do not avoid a second way in which a trial’s results might be affected; screening increases the number of breast cancers diagnosed, and such a diagnosis may lead preferentially to classifying a subsequent death as due to breast cancer rather than any other cause. This second bias operates against any beneficial effect of screening.
Most trials used an independent end point committee to adjudicate causes of death or took the underlying cause of death from national registries (Table 1). Some of the Swedish trials were criticised for using trial investigators to ascribe cause of death, but subsequent evaluations were made using independent and consensus committees and national registry statistics (Nyström et al, 2002; Tábar et al, 2011). Although the exact numbers of deaths from breast cancer were not the same when adjudication was made using different methods, the overall estimates of RR of breast cancer mortality did not change very much. Thus, although this issue is certainly one of the major criticisms of the trials, the panel does not think it would exaggerate the estimates of RR reduction obtained from individual trials, or indeed from a meta-analysis of trials. We comment on the use of other mortality end points in section 3.5.
Many other aspects of the trials have been discussed in the literature, some of which we mention here. The numbers of women reported in each randomised group have not been identical across the multiple publications from certain trials. Although this is somewhat concerning, it is perhaps not surprising, given that population and other registers are not always fully reliable, and data checks over time reveal duplicates and other problems. Moreover, some publications are based on birth cohorts and others on exact age groups (Nyström et al, 2002). The trials report excluding women with a prior diagnosis of breast cancer. Although this is sensible, it can lead to problems if the exclusions are more easily made in the invited group (for example, because of more information obtained at screening) than in the control group. Some trials include physical examination or self-examination in either or both of the randomised groups. However, there is no evidence that these procedures influence breast cancer mortality (Canadian Task Force on Preventive Health Care, 2011).
We acknowledge the problems and biases discussed above, but judge them as unlikely to have had a major distorting effect on the overall result from a meta-analysis of the trials. Moreover, the biases considered do not all operate in the same direction, with some favouring screening and some acting against it. Although it is easy to be critical of many detailed aspects of the breast screening trials, the relevant judgement is whether the biases are so great as to make their results too misleading for guiding policy. The panel does not believe this to be the case, especially in contrast to the problems in interpreting the results from observational studies (section 3.6).
3.3 Meta-analysis of RRs
As discussed above, we focus on the deaths ascribed to breast cancer in 10 of the 11 randomised trials (excluding Edinburgh) and the meta-analysis conducted in the Cochrane Review, using 13 years of follow-up (analysis 1.2 in Gøtzsche and Nielsen, 2011). We do not distinguish the trials labelled ‘adequately randomised’ and ‘sub-optimally randomised’ in the Cochrane Review, but consider the totality of evidence across all the trials. We also use random-effects rather than fixed-effect meta-analysis to estimate an average effect across the trials. Using random effects acknowledges that the trials may be estimating different quantities, which is likely given their clinical heterogeneity, whereas a fixed-effect analysis estimates an assumed common effect across all the trials. The results are shown in Figure 2 along with the RRs of breast cancer mortality. The overall RR, comparing invited with control women, is 0.80 (95% CI 0.73–0.89). There was some heterogeneity in the RRs from different trials, but this was not statistically significant (Figure 2). Thus, the RR reduction in breast cancer mortality in the groups invited to screening is estimated as 20% (95% CI 11–27%).
The RR for women invited to screening is attenuated compared with that for women who actually attend screening (Cuzick et al, 1997). This is because some invited women do not attend, and they may be assumed to get no benefit from the invitation. If the underlying rate of breast cancer mortality in non-attenders is the same as in attenders, one may estimate the RR reduction in attenders as the RR reduction in those invited divided by the (average) attendance rate. Taking the typical attendance in the trials as about 80% (Table 1), this would give 20% divided by 0.80, or 25%). However, this calculation is incorrect as the underlying risk is different in those not attending screening (Zackrisson et al, 2004; Moss et al, 2006). Without this extra information, which is not available for all trials, the calculation of the RR reduction in those attending screening is not possible. In contrast, the calculation can be made, irrespective of underlying risk differences, for the absolute risk reduction (section 3.4). We note that the coverage rate in the UK NHS screening programme is similar to that in the trials, at 77% (The NHS Information Centre). Some non-systematic (opportunistic) screening occurred in the control groups of the trials, but detailed information is not available. This is ignored in our calculations, and will lead to the effect of attending screening being somewhat underestimated.
Other estimates of overall RR
Other meta-analyses of the breast cancer screening trials have given different estimates of the RR reduction. We summarise some of these below.
The Cochrane Review undertook a fixed-effect meta-analysis of the above trials with 13 years follow-up, and reported an estimated RR of 0.81 (95% CI 0.74–0.87). As expected, the fixed-effect analysis gives a slightly narrower CI, but the estimated average RR reduction of 19% is similar to the figure of 20% above.
If women <50 years in the above trials are excluded, the overall RR reported in the Cochrane Review (analysis 1.6, Gøtzsche and Nielsen, 2011) is 0.77 (95% CI 0.69–0.86). So the RR reduction is estimated as 23%, slightly more than the 20% above based on all age groups.
The Cochrane Review (Gøtzsche and Nielsen, 2011) focused on the Canada, Malmö, and UK Age trials as the only ‘adequately randomised’ trials. The estimated RR of breast cancer mortality over 13 years follow-up for invited vs control groups in these trials was 0.90 (95% CI 0.79–1.02), whereas in the trials considered ‘sub-optimally randomised’ it was 0.75 (0.67–0.83). As a compromise between these two estimates, the authors concluded that a 15% RR reduction was plausible.
The US Task Force (Nelson et al, 2009) provided estimated RRs of breast cancer mortality of 0.86 (95% CI 0.75–0.99) for women aged 50–59 years invited to screening, and of 0.68 (95% CI 0.54–0.87) for those aged 60–69 years. These correspond to RR reductions of 14% and 32%, respectively, with an inverse variance weighted average of 19%.
The Canadian Task Force (Canadian Task Force on Preventive Health Care, 2011) gave an estimate of the RR of breast cancer mortality for invited vs control groups of 0.79 (95% CI 0.68–0.90) for women aged 50–69 years, a RR reduction of 21%. Routinely screening for breast cancer with mammography every 2–3 years for this age group was rated as a weak recommendation, based on moderate-quality evidence according to GRADE criteria (Schünemann et al, 2011).
A review by Duffy et al (2012) of all the trials and age groups gave an overall RR of 0.79 (95% CI 0.73–0.86) comparing invited with control groups, corresponding to a 21% RR reduction in breast cancer mortality.
Different meta-analyses include different trials, durations of follow-up, and definitions of outcome. Nevertheless, there is general agreement in their estimates, of about a 20% RR reduction in breast cancer mortality from invitation to screening.
Generalisability of RRs
A key issue is whether the RR reduction in breast cancer mortality observed in the trials may be taken as applying, at least approximately, to the current UK screening programmes. This is a judgement about external validity, rather than an issue for which much direct empirical evidence is available. As always in policy decision making, we need to use evidence from studies undertaken in the past to make an inference about what is likely in the future. Although RRs are often much more generalisable across contexts than absolute risk differences, it is clearly plausible that RRs could change in new situations. Of particular concern in breast screening is that many of the trials were undertaken a long time ago, that the techniques of mammography have changed considerably, that DCIS is now commonly diagnosed through screening (section 4.6), that the treatments for breast cancer, particularly the drug treatment that can eradicate microscopic spread, have become more effective, and that the overall mortality rate from breast cancer has decreased in the United Kingdom and other countries. These points were put to the panel by some expert witnesses. One could therefore argue that breast screening is now less effective/relevant because even later stage cancers can be treated and/or cured, so there is less need to diagnose breast cancers earlier. However, there is a counter argument that because the systemic drug treatments are only partially effective, it could be that the major improvements that drug treatments have brought in cure rates are in fact in part due to breast screening: by diagnosing more cancers at an earlier stage, contemporary drug treatments have a better chance of eradicating microscopic disease, and thus the gains in survival would not have been as great if breast screening did not exist.
Both views have some supporting arguments, but the panel found no convincing evidence that one or other was more likely to be correct. Thus, the panel’s view is that the appropriate manner in which to view the benefits of screening and those of better treatments are that these effects are independent, and thus that the estimates of the relative reduction in breast cancer mortality achieved with screening are the same now as 20 years ago. However, the uncertainty about whether there could be an interaction between the benefits of screening and of contemporary treatments is not a reason for stopping breast screening.
Particular aspects for which there is at least some evidence about the external validity of the trials relate to age, screening intensity, and follow-up time. The RR does not appear to change much across the age range 50–69 years (Nyström et al, 2002), but it may be reduced below the age of 50 (Canadian Task Force on Preventive Health Care, 2011). The RR does not appear to depend strongly on the number of screens, or the screening interval, at least across the ranges studied in the trials. The only randomised trial that compared different screening intervals is inconclusive (Breast Screening Frequency Trial Group, 2002). Reports from trials with long follow-up suggest that little benefit in terms of breast cancer mortality is seen in the first 5 years after starting screening, and that the benefit lasts for at least 10 years after cessation of screening. This is not surprising, given the slow progression rates of many breast cancers.
The panel concludes that the current screening programmes in the United Kingdom, which invite women aged 50–70 every 3 years to undergo mammography, are likely to deliver about a 20% reduction in breast cancer mortality at ages 55–79 years. Clearly, there is uncertainty in this figure. In addition to the uncertainty owing to the limited numbers of breast cancer deaths across the trials, there are potential biases in the trials and concerns about the generalisability of results from the trials to the current UK screening programmes. We note, however, that the level of disagreement in the literature about the RR reduction is minor in comparison to the controversy about the absolute risk reduction.
3.4 Absolute risk reduction
The above discussion suggests a natural way to estimate the absolute risk reduction that applies to the current screening programmes in the United Kingdom. For women aged 50 invited to screening, we assume no benefit in breast cancer mortality until age 55, a 20% reduction at ages 55–79, and no change in the rates of other causes of death. An estimated 1.70% of UK women aged 50 are currently expected to die from breast cancer between the ages of 55 and 79; this is calculated from UK mortality rates (2008–2010) and takes into account the risks of dying from other causes. Since the UK programme has existed since the late 1980s, one may assume that this risk has already been reduced by 20% through screening. Hence, the risk without the screening programme would have been 2.13% (as 1.70/2.13=0.80), and the estimated absolute risk reduction is 2.13−1.70=0.43%.
The number of women needed to be invited for screening for 20 years starting at age 50 in order to prevent one death from breast cancer is therefore 1/0.43%=235. An alternative way of expressing this is that, for every 10 000 women invited into the screening programme at age 50, about 43 deaths from breast cancer would be prevented.
The absolute risk reduction for women attending screening can be estimated as the absolute risk reduction in those invited divided by the average coverage rate in the NHS breast screening programme (77%), so about 0.43%/0.77=0.56%. The number of women needed to be screened for 20 years to prevent one death from breast cancer is then 1/0.56%=180. For every 10 000 women attending screening from age 50–70 years, about 56 deaths from breast cancer would be prevented.
The above calculations are based on the same principles as those used in some publications (Advisory Committee on Breast Cancer Screening, 2006). Essentially, the RR reduction from the trials is regarded as approximately generalisable to the current UK screening programmes, and the corresponding absolute risk reduction is calculated by applying this RR reduction to the national rates of breast cancer mortality for an appropriate age group. The considerable uncertainty in the estimated RR reduction of 20%, as discussed in section 3.3, of course carries through to these estimates of absolute risk reduction.
The NHS screening programme estimates that 1400 lives are saved per year in England owing to breast screening (Advisory Committee on Breast Cancer Screening, 2006). For comparison and illustrative purposes, the panel estimates that for the 307 000 women (aged 50–52) who each year receive their first invitation to a 20-year screening programme (3-year average 2008/2009–2010/2011, The NHS Information Centre), 0.43% of 307 000, or about 1300, deaths from breast cancer per year are prevented. This is close to the NHS screening programme’s estimate.
Different methods and estimates in the literature
The marked difference in estimates of absolute risk reduction proposed in the literature is one of the greatest sources of controversy about the value of breast cancer screening (McPherson, 2010). The different estimates stem from the very varied methods used for their calculation. When calculations are made directly from the trials’ data themselves, the absolute risk reduction depends overwhelmingly on the underlying risk of breast cancer, which is principally governed by the age groups considered, the length of follow-up, and the population studied. Although this is obvious, it has also been empirically shown by comparing different durations of follow-up in the Swedish Two County trial (Tábar et al, 2011).
The Cochrane Review (Gøtzsche and Nielsen, 2011) focused on the Canada, Malmö, and UK Age trials as the only ‘adequately randomised’ trials. The absolute risk of breast cancer death in the control groups of these trials was low (overall rate of 0.33%), partly because of the inclusion of the large UK Age trial (women initially aged 39–41) and the 13-year follow-up period considered rather than the 25-year period from age 55–79, used above by the panel. With the Cochrane Review authors’ estimated 15% RR reduction, this leads to an estimated absolute risk reduction of 0.05%, or equivalently that 2000 women need to be invited to screening to prevent one breast cancer death.
An entirely different estimate is given by Duffy et al (2010) based on 22 years of follow-up for those aged 50–69 in the Swedish Two County trial, which estimated a 38% reduction in breast cancer mortality. The calculation considers the absolute risk reduction per women screened across the 7 years of screening in the trial, and makes the strong assumption that the absolute benefits can simply be multiplied up to reflect the 20 years of screening in the UK programmes. This leads to an estimated absolute risk reduction of 0.88% in women screened, or equivalently that 113 women need to be screened to prevent one breast cancer death.
The US Task Force (Nelson et al, 2009) considered a period of 7 years of invitation to screening and 13 years of follow-up after first invitation (Nelson et al, 2009). For ages 50–59 years, they estimated that 1339 women needed to be invited to prevent one death from breast cancer. For ages 60–69 years, their corresponding estimate was 377 women.
The Canadian Task Force (Canadian Task Force on Preventive Health Care, 2011) estimated from the trials that screening 720 women aged 50–69 years once every 2–3 years for about 11 years would prevent one death from breast cancer.
Beral et al (2011) summarised various published estimates of absolute risk reduction from the literature, and concluded that around one breast cancer death would be prevented in the long term for every 400 women aged 50–70 years regularly screened over a 10-year period, based on a previous review (Advisory Committee on Breast Cancer Screening, 2006).
From the above examples, it is clear that different methods of estimation give about a 20-fold difference in the estimates of absolute risk reduction. The panel’s view is that to estimate the impact of the UK screening programmes on absolute risk of dying of breast cancer, it is necessary to consider the relevant underlying risk of breast cancer to which the RR reduction from the trials should apply. The panel believes this is best derived from the current UK national rate of breast cancer deaths for women aged 55–79 years. Calculations made directly from the absolute risks observed in the trials are heavily, and often misleadingly, influenced by the age groups included and the length of follow-up available (Beral et al, 2011). Estimates also depend on whether they are expressed per woman invited or per woman screened. We note, however, to the extent that the absolute rate of breast cancer mortality in the United Kingdom is currently declining, the absolute risk reduction from the UK screening programme would also be expected to decline correspondingly in the future.
Life expectancy gained
A reduction in the risk of breast cancer will lead to an increase in life expectancy. As breast cancer is only one of many causes of death, the average gain in life expectancy from the UK screening programme is likely to appear modest. An estimate can easily be derived by contrasting the life expectancy for women aged 50, using current national rates of breast cancer mortality and deaths from other causes, to that which would apply if the rates of breast cancer mortality were 25% higher in each year from age 55–79 years. (25% higher corresponds to the assumed 20% benefit from screening, as 1.25=1/0.80.) This calculation leads to an estimate of 0.073 years (or 27 days) of life gained on average for each woman aged 50 invited to screening. To put this in perspective, the panel noted that abolition of all deaths from breast cancer completely would add 159 days on average to life expectancy for women aged 50.
We also note that this is a crude average of a zero gain for the vast majority of women and a substantial gain for a few. Alternative but equivalent ways of expressing this gain are as follows: (a) for the 307 000 women aged 50–52 who are invited for screening each year, about 22 000 years of life will be saved; (b) for each 10 000 women invited to screening, 730 years of life will be saved; (c) for each 10 000 attending screening, about 950 years of life will be saved; (d) given that 1 in about 180 women attending screening avoid breast cancer death, such a woman would expect to gain on average an extra 17 years of life.
3.5 Other considerations
The Edinburgh trial was the only UK trial in an age group that is within that of the national screening programme. However, as discussed in section 3.2, we excluded this trial because problems in the cluster randomisation led to a severe imbalance in socioeconomic status of the women between the groups, and socioeconomic status influences, in opposite directions, the risk of developing breast cancer and of dying from breast cancer. At 14 years of follow-up, the unadjusted results showed a 13% reduction in breast cancer mortality. However, on adjusting for socioeconomic status, the rate ratio was 0.79 (95% CI 0.60–1.02), a RR reduction of 21% (Alexander et al, 1999). Thus, although doubts must remain about the validity of this latter estimate, we note that it very much in line with the figure of 20% we have used above.
In the preceding sections, we have focused exclusively on breast cancer mortality. Owing to the concerns about whether such deaths are reliably adjudicated in the trials, some authors have suggested that this has led to exaggerated estimates of the RR reduction, and that the outcomes of death from any cancer, or death from any cause, are the appropriate ones for judging the impact of breast screening on mortality. The panel disagrees with this: evaluating all-cancer or all-cause deaths in the trials will lack power because breast cancer deaths represent only a small proportion within these categories. In particular, a 20% RR reduction in breast cancer deaths for ages 55–79 years would yield only 3.0% and 1.2% RR reductions in all-cancer and all-cause deaths, respectively. The trials are not of sufficient size (in terms of numbers of women and length of follow-up) to allow such small RR reductions to be reliably estimated. Hence, a statistically non-significant effect for all-cancer or all-cause deaths in the trials cannot be interpreted as evidence against a reduction in breast cancer deaths.
Some authors have argued that changes in the incidence of more advanced breast cancer, whether defined as above a certain tumour size or with spread to the ipsilateral axillary nodes, is a useful surrogate indicator of the effect of screening on breast cancer mortality in the trials, as the ultimate risk of dying of breast cancer depends in part on the stage of disease at first presentation. Although, on average, one could expect a breast cancer screening programme to lead to diagnosis of breast cancers at an earlier stage, this approach cannot, however, directly exclude lead time effects. The situation is further complicated by the issue of interval cancers, which have been shown in more than one study, as compared with screen-detected cancers, to be more often high grade, which is itself predictive of a poorer prognosis. However, what is less clear is whether the prognosis of a breast cancer is determined only by the stage when diagnosed, or whether in the absence of a screening programme the underlying biology is the main determinant of outcome, and this in turn influences when the cancers present. Thus, for those cancers diagnosed earlier by screening, it is not clear which, if any, of the clinical markers of prognosis (stage, size, grade etc.) is the best predictor of ultimate outcome; or is it some other fundamental characteristic only assessable by molecular biology?
Therefore, there appears to be little reason to use these surrogate outcomes as evidence for or against the benefits of screening, as substantial assumptions are needed to estimate the consequent effect on breast cancer mortality. Only if one wanted to disregard completely the evidence about breast cancer mortality from the trials, would the use of such surrogate outcomes have value.
There are possibilities of specific harms of screening in terms of induction of other cancers through the X–rays used in mammography or the radiotherapy or drug therapy used to treat breast cancer, and of coronary damage and deaths through radiotherapy (especially of the left breast). These potential harms are discussed in section 5.2.
Statistical and other uncertainties
It is conventional that results from statistical analyses, including meta-analyses, are presented with a measure of statistical uncertainty such as 95% confidence limits. Although these are helpful in giving an impression of the possible influence of the play of chance (given the sample sizes that are available in the studies considered), they fail to represent the uncertainties because of possible biases (from lack of internal validity of the studies) or owing to generalisation from the trials to a new context (external validity). So, the CI given for the RR reduction of breast cancer mortality from a meta-analysis of the trials is an understatement of the uncertainty about the RR reduction that applies to the UK screening programmes. A RR reduction of 20% represents the panel’s judgement of the evidence, and should be regarded as an approximate figure rather than a precise estimate.
3.6 Observational studies
In addition to the trials, the panel also considered the value of observational studies in estimating the impact of screening on breast cancer mortality. The RCTs of mammographic screening were conducted at least 20 years ago and most over 30 years ago. Observational studies may help to quantify the effects of screening in an era with major improvements in diagnostic imaging, clinical care, and patient outcomes, as many of the observational studies are more recent than the trials. Both proponents and critics of screening have suggested that the observational studies are more relevant today than the RCTs. However, these studies are beset by many more biases with consequent problems of interpretation. It is also possible that they are more prone to selective reporting than trials, in that the results obtained determine the enthusiasm of the authors and journals for publication.
The biases inherent in observational studies differ by type of study. All share the common problem of potential lack of comparability of screened and unscreened women. It is this feature that the RCTs are designed to address. Each observational study design has strengths and weaknesses and, within each class, specific studies vary in their methods and credibility. The relative merits and problems of the various observational study designs are hotly contested both in the literature and in the evidence the panel heard.
Ecological and time-trend studies
Some observational studies compare time trends for breast cancer mortality in countries or areas before and after the introduction of screening, or concurrently between areas with and without screening. In the first type of study, extrapolation of time trends demands that decisions are made, for example, about the linearity or otherwise of the trend, the choice of time periods considered as ‘before’ and ‘after’ screening, and the age groups included. In the second type of study, choices have to be made about the areas to include, the time period considered, and the age groups included. Such decisions, which can appear to have been made rather arbitrarily, can have a profound impact on the estimates obtained. Lack of comparability and different time trends in the groups being contrasted could lead to substantial bias. For these reasons the panel does not consider that these types of studies provide reliable evidence on the effect of screening on breast cancer mortality, and amongst observational study designs we focus instead on case–control studies and incidence-based mortality studies.
Case–control studies compare the history of breast screening attendance between women dying of breast cancer and control women who did not die of breast cancer. Case–control studies are prone to a number of potential biases. The main problem with case–control studies is that those attending breast screening are different from those who do not attend. This is referred to as self-selection bias or the ‘healthy screened effect’. Attendance is influenced by social and demographical factors that are also likely to be related to the risk of dying from breast cancer, with the resulting bias potentially exaggerating the estimated effect of screening. Also, the existence of a breast screening programme in an area may be associated with better treatment of breast cancer. Therefore, women diagnosed with breast cancer in an area with a breast screening programme may also receive more effective treatment than women where there is no such programme. This would bias the study in favour of screening. Attempts are made to correct for the resulting biases by choice of controls and statistical adjustment (Connor et al, 2000; Duffy et al, 2002).
Some of the expert witnesses who gave evidence to the panel felt that case–control studies provided the most reliable form of observational data while others believed the opposite. The panel undertook a review of the individual characteristics of a number of case–control studies to assess the potential bias of each one (Appendix 3). In general, the studies matched controls to cases by both age and residence but some matched on just one of these variables. Self-selection bias was discussed in around three-quarters of the studies and statistically controlled for, using a variety of methods, in less than half of the studies (Appendix 3).
The case–control studies show more favourable benefit of screening compared with the trials. The panel believes that this is plausibly because of inadequate control for self-selection bias rather than in screening actually being far more beneficial now than in the trials. Attempts to correct for self-selection bias were based on information outside of the study itself (either from a previous time period, or from other geographical areas) that may not be fully relevant. When adjustment was made, the apparent benefit of screening was diminished. The bias that screening could be associated with better treatment was controlled for studies conducted in countries with uniform treatment services.
In conclusion, the panel notes that the beneficial effects of screening are in the same direction as those seen in the trials, but that control for self-selection bias may be inadequate in many of the studies.
Incidence-based mortality studies
Njor et al (2012) conducted a review of European studies on the impact of service mammography screening on breast cancer mortality using incidence-based mortality. In these studies, only breast cancer deaths occurring in women with breast cancer diagnosed after their first invitation to screening are included. They classified the studies according to type of comparison group. These were (1) women not yet invited, (2) historical data from the same region as well as from historical and current data from a region without screening, and (3) historical comparison group combined with data for non-participants.
They found that the effect of screening on breast cancer mortality varied across studies. The RRs were 0.76–0.81 in group 1; 0.75–0.90 in group 2; and 0.52–0.89 in group 3. Study databases overlapped in both Swedish and Finnish studies, adjustment for lead time was not optimal in all studies, and some studies had various other methodological limitations. There was less variability in the RRs after allowing for the methodological shortcomings. On the basis of evidence from the most reliable incidence-based mortality studies, they concluded that the most likely impact of European breast screening programmes was a breast cancer mortality reduction of 26% (95% CI 13–36%) among women invited for screening and followed up for 6–11 years.
Many observational studies have been published, and their conclusions hotly contested. In general, the more contemporaneous case–control and incidence-based mortality studies support the evidence from the trials that screening does have a beneficial effect on mortality. The panel’s view is that the trials provide more reliable evidence for an estimate of mortality reduction. Nevertheless, the observational studies support the hypothesis that screening continues to be beneficial in an era of improved treatment.
The purpose of breast screening is to detect cancer early, before it has come to clinical attention. If all cancers would eventually be clinically recognised and treatment was the same and equally effective no matter when the tumour was diagnosed, then screening would be redundant. However, the understanding is that if the cancer is diagnosed earlier, then treatment will be more effective. This is the assumption on which screening is based. The evidence reviewed in section 3 supports that assumption.
As cancers are detected earlier because of screening, we expect the cancer incidence to be higher among screened women during the screening period (the time period between the detection of a cancer at screening and when it would have presented clinically is the ‘lead time’ and is an inevitable part of screening). In principle, when screening ceases the incidence should fall back so that by the end of the screening period plus lead time, the cumulative incidence in the screened and control populations should be the same.
Some screen-detected cancers, however, may never progress to become symptomatic (clinically detectable) while some women would die from another cause before the cancer became evident. This adverse consequence (harm) of screening is called overdiagnosis or overdetection. It is variously defined as the ‘detection of cancers on screening that would not have been found were it not for the screening test’ (IARC, 2002), or ‘that would never have clinically surfaced in the absence of screening’ (Seigneurin et al, 2011) or ‘that would not have presented clinically during the woman’s lifetime (and therefore would not have been diagnosed in the absence of screening)’ (Biesheuvel et al, 2007). Thus, it refers to all cancers, invasive or in situ.
Underpinning the concept of overdiagnosis is the belief that cancers grow at variable rates, as depicted, for example, in Figure 3A (Esserman et al, 2009; Elmore and Fletcher, 2012). Some screen-detected cancers may progress so slowly, that they would never have presented clinically; theoretically, some may be static or even regress but the practical effect is the same. Detection of these cancers turns women into patients, leads to surgery and other treatments that by definition are not beneficial for these women and can cause harm, and adversely affects their quality of life.
As cancers are diagnosed earlier owing to screening, we expect cancer incidence to be higher among screened than unscreened women during the screening period. However, when screening ceases, the incidence should fall back (sometimes referred to as the compensatory drop). If there is no overdiagnosis, the cumulative incidence in the screened and unscreened women will equalise after screening ceases, after a period equivalent to the lead time has elapsed. (Figure 3B, left). If there is overdiagnosis, however, the cumulative incidence will remain higher in the screened group and not equalise over time (Figure 3B, right).
Some overdiagnosis is seen as inevitable – some women will die before their screen-detected cancer would have presented symptomatically. Establishing its frequency is critically important in weighing up the benefits and harms of screening, both for populations and individual women. A big challenge is to get unbiased estimates of the risk. Opinions on the frequency of overdiagnosis range from it being trivial and unimportant to women to being very important and swamping any benefit of screening.
Whether a particular woman has had an overdiagnosed cancer, or whether individual tumours are overdiagnosed, cannot be judged. It is only possible to estimate frequency of overdiagnosis. The issue for the UK screening programmes is the magnitude of overdiagnosis in women who have been in a screening programme from age 50–70, then followed for the rest of their lives. There are no data to answer this question. Any estimate will therefore be, at best, provisional.
4.2 Sources of data on overdiagnosis
Overdiagnosis can be estimated from RCTs or observational studies. Valid estimates depend on similar underlying risks of breast cancer in the screened and unscreened women, and that the effect of lead time has been accounted for (Puliti et al, 2012). Overdiagnosed cancers are not all those detected earlier by screening but the subset that would not otherwise have been detected at all.
Randomised controlled trials have the advantage that by design they compare groups of women with the same average prognosis. There are disadvantages of the available RCTs though, including a screening phase that was always shorter than that employed in the NHS national screening programmes, and which varies across the RCTs.
The most reliable estimates of overdiagnosis are from those RCTs in which there was no screening of the control group at the end of the screening period. As screening advances detection of breast cancer, follow-up should extend beyond the screening period to allow a catch up of diagnoses in the unscreened group. In essence, this extended follow-up is needed to distinguish earlier diagnosis from overdiagnosis. If allowance is not made for such catch up, the extra cancers diagnosed in the screened group include some that would also have emerged without screening, albeit later. In principle, the extended period of follow-up should correspond to the lead time, but the average lead time is also the subject of debate, and the lead time is not the same for all cancers. As follow-up is extended well beyond the screening period, new cancers in both the screened and unscreened groups will be included regardless of screening, and the ratio of total numbers of diagnosed cancers will converge towards one (Puliti et al, 2012). An ideal follow-up would be to the end of women’s lives. However, pragmatically an adequate follow-up is perhaps 5–10 years after the end of the intervention period (Biesheuvel et al, 2007; Puliti et al, 2011). The trials that clearly did not invite the control group for screening at the end of the screening phase were the two Canadian trials and the Malmö I trial for women aged 55–69 years (Miller et al, 2000, 2002; Zackrisson et al, 2006).
In the other RCTs, all the women in the control group were offered screening at the end of the active period of the trial. Estimates of overdiagnosis from these trials are problematic. Screening of women in the control group might itself be expected to lead to some overdiagnosis, and thus to an overall underestimate of overdiagnosis. Exclusion of cancers diagnosed at the end-of-trial screening of the control group would overestimate overdiagnosis, as the control women have not been followed long enough.
Besides the RCTs, there are many non-randomised (observational) studies that have attempted to estimate overdiagnosis. These studies raise many concerns, according to the study design, with the key concern being the likely non-comparability of groups, for example, in different geographical areas. As one contributor to overdiagnosis is the development of other diseases leading to death, the risk of overdiagnosis might be age-dependent. Estimates of overdiagnosis may thus be affected by the age distribution of the screened group. For non-RCTs it is especially important that age distributions are comparable.
4.3 Estimating overdiagnosis
Overdiagnosis can be estimated by comparing the incidence of breast cancer in cohorts of screened and unscreened women who were followed for several years. Unfortunately, although there is agreement on the concept of overdiagnosis, there has been a wide divergence of views on how to estimate the amount of overdiagnosis, with the result that estimates of the frequency of overdiagnosis vary widely, from ∼ 0–50%.
The estimated amount of overdiagnosis depends greatly on the way the calculation is made, and many different methods exist. De Gelder et al (2011) (Appendix 4) described seven approaches, all of which have been applied in recent publications. The differences relate to which cases are included in the numerator and, especially, on the choice of denominator. The rate of overdiagnosis can be considered in relation to women invited to be screened, women actually screened, or cancers actually detected by screening. It can also relate to lifetime or the screening age range. It can be expressed as a percentage of the cancers diagnosed in the screening group or as the percentage excess over that seen in the unscreened group. Also, it can be expressed as a relative increase or an absolute increase. Clearly, the different estimates address different questions. Understanding published estimates of overdiagnosis percentages requires identification of exactly how those estimates were derived.
The panel believes that there is no single best way to estimate overdiagnosis. For RCTs, the main options are:
From the population perspective, the proportion of all cancers diagnosed during the screening period and for the rest of the woman’s lifetime in women invited to screening who are overdiagnosed (not including any diagnosed before the age of screening). This probability can be estimated using the difference in cumulative numbers of newly diagnosed breast cancers in groups invited or not invited to be screened, expressed either as a percentage of the number of cancers in the control group (excess risk) or as a percentage of the number of cancers in the screening group (proportional risk). This probability will diminish over time as the number of newly diagnosed cancers increases in both groups.
From the perspective of a woman invited to be screened, the probability that a cancer diagnosed during the screening period represents overdiagnosis (Welch et al, 2006; Harris et al, 2011). This probability can be estimated using the difference in cumulative numbers of newly diagnosed breast cancers in groups invited or not invited to be screened, expressed as a percentage of the cancers diagnosed during the screening phase of the trial for women in the invited group. The cases in the invited group can also be restricted to those actually detected at a screening visit – that is, excluding interval cancers or cancers among women who did not attend for screening.
These approaches use the same numerator but varying denominators. The panel considers that the appropriate calculations should include DCIS cases, but notes that some studies have reported estimates of overdiagnosis in relation to invasive cancers only.
The panel illustrates how different approaches yield various estimates using data from the Malmö trial (Andersson et al, 1988; Zackrisson et al, 2006), partly following Welch (Welch et al, 2006; Welch and Black, 2010). All cancers, both invasive and non-invasive DCIS, are considered. Also, for transparency, the calculations are expressed in terms of numbers of women whereas some authors have reported rates per 1000 woman years of follow-up.
The Malmö I trial included women aged 45–69 at entry. Cancer incidence was reported after an average of 15 years of follow-up (to December 2001) (Zackrisson et al, 2006). In the active screening period up to 1990, there were 741 cancers diagnosed detected in the screening group and 591 in the control group, an excess of 150. In the period from 1990 to 2001, a further 579 and 614 new cancers were diagnosed, respectively, showing a catching up of 35 cancers. The total numbers of cancers in the screened and control groups were 1320 and 1205, respectively, showing an overall excess of 115 cancers diagnosed among screened women. Zackrisson et al (2006) reported a RR of 1.10 and interpreted these data as showing an estimated overdiagnosis of 10% (95% CI 1–18%). Reporting such a percentage requires consideration of the denominator: 10% of what (Fletcher, 2011)? In fact, the figure of 10% represents the estimated excess risk of a diagnosis of breast cancer among women who had been invited to be screened, and were followed for 15 years after the trial ended. The figure of 10% thus addresses the first key question stated above – population impact.
The panel calculated four estimates of percentage overdiagnosis from the Malmö I trial (Table 2A). The younger women (age 45–54) were offered screening at the end of the study period so the estimates are shown both for all women (age 45–69 at enrolment) and only for women aged 55–69. Different definitions of overdiagnosis lead to estimates ranging from 9 to 29%, although they are based on the same trial.
To answer the second key question – from the perspective of a woman being screened, what is the probability that a cancer diagnosed during the screening period represents overdiagnosis – it is important to include screen-detected cancers and interval cancers. Among women being screened, whether in a trial or a routine screening programme, not all of the diagnosed cancers will be detected at the routine screening; many cancers will be picked up between screens, as ‘interval’ cancers and might have presented symptomatically in the absence of screening. The relative proportion of interval to screen-detected cancers will increase as the screening interval increases (Breast Screening Frequency Trial Group, 2002) – in general more screen-detected cancers implies fewer interval cancers – so excluding interval cancers will give an estimate of overdiagnosis subject to screening frequency. Further, clinical experience suggests that suspicion of cancer may encourage a woman to accept the invitation to screen. The panel therefore prefers to use, as a denominator for the risk of overdiagnosis among women invited for screening, the second key question, the number of cancers diagnosed in invited women throughout the period of screening.
4.4 Estimates of overdiagnosis
The literature on overdiagnosis has been reviewed by several authors since 2005. They used different study inclusion criteria, but gave most attention to data from RCTs. Moss (2005) calculated overdiagnosis for eight RCTs as did Gøtzsche (2004) for six of the same trials. Biesheuvel et al (2007) reviewed the literature with particular attention given to the RCTs and the two former reviews. Recently, Puliti et al (2012) reviewed the European literature covering observational studies. Biesheuvel and Puliti both considered the issue of bias in each of the studies, specifically in relation to adjustment for lead time and case-mix.
Moss (2005) and Gøtzsche (2004) produced very different estimates of overdiagnosis from the same trials. Biesheuvel et al (2007) converted all their estimates to a common measure of overdiagnosis (method A described below), but important discrepancies remained. Biesheuvel et al (2007) reported that in the studies they considered least biased, overdiagnosis estimates ranged from −4 to 7.1% for women aged 40–49 years, 1.7 to 54% for women aged 50–59 years, and 7 to 21% for women aged 60–69 years (Biesheuvel et al, 2007). Similar large variations have been seen in the estimates of overdiagnosis from observational studies (Puliti et al, 2012). Some of the variation seen in these age-specific estimates stems from very small numbers of cases within age groups within trials.
Given the wide variation in both the methods used and the estimates obtained, the panel calculated four estimates of percentage overdiagnosis:
A. Excess cancers as a proportion of cancers diagnosed over whole follow-up period in unscreened women
B. Excess cancers as a proportion of cancers diagnosed over whole follow-up period in women invited for screening
C. Excess cancers as a proportion of cancers diagnosed during screening period in women invited for screening
D. Excess cancers as a proportion of cancers detected at screening in women invited for screening
RCTs without screening of control group at the end of the trial
The most reliable estimates of overdiagnosis come from RCTs in which women in the control group were not offered screening at the end of the trial. Three trials clearly meet this criterion: Malmö I, for women aged 55–69 years, and the two Canadian trials that screened women for 5 years and reported follow-up data at 11 years (i.e., about 6 years after the end of screening; Miller et al, 2000, 2002). The estimates of overdiagnosis from these two trials were quite similar to those from Malmö I.
The situation with the HIP study was less clear from the available literature, so the panel excluded this study for the purposes of the estimate of overdiagnosis. In addition, the panel had difficulty from the published literature extracting the data on the numbers of cancer cases in the two arms using the same definition of cases as the other three studies. In particular, the first report of the HIP study included both DCIS and lobular cancer in situ (LCIS) in the non-invasive cases (Shapiro, 1977; Shapiro et al, 1982), but thereafter we could not determine whether LCIS cases had been included in the subsequent incidence data, nor whether non-invasive cases had been included in the process of cross-checking with the New York Cancer registry data and National Death index (Chu et al, 1988). Estimates of overdiagnosis from the Malmö I and the two Canadian trials using the four methods already described are shown in Table 2B. The estimates from the three RCTs are quite similar.
Opportunistic screening in the control group would lead to an underestimate of overdiagnosis. In the Malmö and Canadian trials, about 25% (26% and 17%, respectively, in the two Canadian trials) of the women in the control group reported having received a mammogram both during the active trial period and follow-up period. No allowance has been made in the above calculations for that effect.
All four methods use the same numerator, derived from the difference in newly diagnosed cases of breast cancer in the group invited for screening and the control group. Methods A and B differ in whether they compare the excess against the number of cancers diagnosed in the control group or the screening group. Many published estimates use the former (method A).
None of the methods are wrong – they just address different questions. The panel’s preferred measures are method B to address the population perspective and method C for the perspective of an individual woman. Figure 3C shows the results from random effects meta-analyses for these two estimates of overdiagnosis.
As many have noted, these three RCTs offer the most reliable evidence for an estimate of overdiagnosis. The combined data suggest a risk of overdiagnosis of about 11% with a population perspective and 19% from the individual woman’s perspective.
The panel considers the data consistent with overdiagnosis of about 5–15% from the population perspective and 15–25% from the individual woman’s perspective. These estimates are subject to the same sources of uncertainty as noted for the estimates of mortality from the RCTs. In addition, the estimates are not tailored to the UK screening scheme or a 20-year screening period.
In total, these three trials included only 1200 cancers diagnosed during the screening period of which an estimated 243 were overdiagnosed. Given these small numbers, it is important to consider other estimates from other RCTs and the higher quality observational studies. However, those studies clearly provide less reliable estimates.
RCTs with screening of control group at the end of the trial
In several RCTs, all the women in the control group were offered screening at the end of the active phase of the trial. Estimates of overdiagnosis from these trials are problematic. Exclusion of cancers detected at the end-of-trial screening of the control group would overestimate overdiagnosis, as the control women have not been followed long enough. Such an effect is clearly seen in the RCTs without end-of-trial screening. On the other hand, inclusion of cases detected at the end-of-trial screen of women in the control group means that screening is not being compared with no screening. Also, some of the cancers detected by that screen would themselves be overdiagnosed. Thus, including these cancers would lead to an underestimate of overdiagnosis.
Although for several trials both calculations just described are possible, the estimates obtained generally vary widely. For example, for the Stockholm trial using method B, the estimate of overdiagnosis varies from −2.6% from all diagnosed cancers to +39% if cancers detected at the end-of-trial screen of the control group are excluded. Although it is reasonable to believe that these two estimates bracket the desired answer (had there been no extra screen and with extended follow-up), the panel believes it is impossible to get useful and reliable estimates of overdiagnosis from these trials. An alternative approach is to estimate the effect of lead time and adjust for it. That approach makes very strong, unverifiable assumptions, and the panel is not persuaded that such an adjustment can be made reliably.
Overdiagnosis can be estimated from some non-RCTs, but as always with observational studies there are serious concerns about comparability. Numerous observational studies have adopted a variety of study designs to compare screened and unscreened women or, more often, women who were or were not invited to screening.
There is a considerable body of literature examining the effects of screening in populations and trying to assess the degree of overdiagnosis. Even in the absence of screening, breast cancer incidence rates are not stable over time in populations, and the wide variation in quoted overdiagnosis rates reflects this variation as well as different lengths of follow-up, different statistical assumptions, and different ways of accounting for lead time.
When screening is introduced there will be a short-term rise in the incidence of newly diagnosed cancers. If that rise is solely due to advancing the time when some cancers are diagnosed the increase should fall back to pre-screening levels after some years. A failure to do so may be interpreted as evidence of a degree of overdiagnosis (Esserman et al, 2009). Time trends can also be examined for women of different age groups: before, during, and after the screening programme age range. Such data are shown in Figure 1A in section 2 for breast cancer incidence in the United Kingdom. The increase in incidence associated with the introduction of population screening is clearly seen, first for women aged 50–64 and later for women aged 65–69.
Some studies have compared post-screening incidence with a projection of previous incidence trends in the screened population. Those studies have resulted in very different estimates of overdiagnosis. The panel asked Cancer Research UK to review a set of plausible assumptions made in the literature and to produce estimates based on these assumptions (Jørgensen and Gøtzsche, 2009a; Duffy et al, 2010). The panel found that by changing each of the assumptions, one could get a vast range of estimates of overdiagnosis (Appendix 6). The results of the modelling produced a range of estimates for the impact of the current NHS breast screening programme in England from 0 to >6550 women (aged 45) per year in England. Ten per cent of the results were <1150 and ten per cent >4115. As there appears to be no a priori reason to favour one set of assumptions over another, the panel do not think that approaches based on extrapolation offer a robust method to estimate overdiagnosis.
Several groups have compared breast cancer incidence trends over time in screened and unscreened countries or regions over the same time period (Jørgensen and Gøtzsche, 2009). The difficulty with these studies is distinguishing true overdiagnosis from the excess incidence of breast cancer that results from screening, bringing forward the time of diagnosis. Given that overdiagnosis is defined as a cancer that would not have come to attention in the woman’s life span, long follow-up after cessation of screening is essential. The difficulties can be illustrated by studies of comparisons of incidence rates in regions within a single country that did or did not introduce population screening. A study from Denmark is illustrative, as only 20% of the Danish population was offered organised mammography screening over a long time-period (Jørgensen et al, 2009). Screening was introduced in Copenhagen in 1991 and in Funen in 1993 for women aged 50–69. The authors noted that the population in those areas has distributions of age and socioeconomic status comparable with the rest of Denmark.
Table 2C shows the numbers of breast cancers diagnosed per 100 000 women in screened and non-screened areas of Denmark for 20 years before and 13 years after the introduction of screening in 1991. Incidence rates of breast cancer were higher in the screened areas than in the non-screened areas before screening began, suggesting some non-comparability of the areas. During the 13 years of screening, the incidence in women aged 50–69 rose both in the screened areas and the non-screened areas, but more in the screened areas. Incidence also rose in women aged 70–79. One way to estimate overdiagnosis is to compare the ratio of new cancers in screened and unscreened groups in the two periods. In the pre-screening period, the ratio was 1.08 (214/198) and for the screening period it was 1.35 (386/286). The authors say that these data indicate 35% overdiagnosis, but if we adjust for the pre-screening difference the excess is 25% (1.35/1.08=1.25). These simple calculations ignore the underlying rise in cancer incidence throughout the period. The authors used regression modelling to take account of incidence trends and age differences, giving an estimate of 33%. As noted earlier, such analyses make additional assumptions that are not verifiable. Studies such as this do not indicate the likely effect of long-term follow-up in reducing the excess in the incidence rate in the screened compared with the unscreened populations.
There have been many other observational studies, but most have the type of problem illustrated here in distinguishing overdiagnosis from the expected increase in breast cancer incidence due to screening and require many assumptions to derive estimates of overdiagnosis. A recent review of 13 observational studies showed overdiagnosis to vary in the range of 0–54%. Adjustment for lead time and breast cancer risk yielded overdiagnosis estimates in the range of 1–10% (Puliti et al, 2012).
The panel’s judgement is that the best estimates will come from long-term follow-up of RCTs, as reviewed above.
Statistical and other uncertainties
As noted in section 3, it is conventional that results from statistical analyses, including meta-analyses, are presented with a measure of statistical uncertainty such as 95% confidence limits. Although these are helpful in giving an impression of the possible influence of the play of chance (given the sample sizes that are available in the studies considered), they fail to represent the uncertainties due to possible biases (internal validity of the studies) or to generalisation from the studies to a new context (external validity). So the CIs given for the estimated percentage overdiagnosis are an understatement of the uncertainty about the risk of overdiagnosis associated with the UK screening programmes. Estimates of overdiagnosis have additional uncertainties relating to which estimate to use, and the data are not available for all studies to calculate overdiagnosis in the suggested ways.
The panel believes that overdiagnosis occurs, and that women need to be aware that screening carries a risk of detecting cancers, invasive and in situ, which would not have troubled them in their lifetime. Tumours that represent overdiagnosis cannot be identified clinically and so will have to be managed according to current clinical protocols.
The panel considers that the data from three of the RCTs without end-of-trial screening of controls provide the most reliable estimates of the extent of overdiagnosis, but notes that there is a rather limited amount of data and numerical estimates are subject to several uncertainties in common with estimates of mortality benefit.
As noted for the estimated benefit for mortality (see section 3.2), the overdiagnosis rates estimated from old RCTs may not reflect those in current screening programmes. There is, however, no clear evidence to suggest that the current rate of overdiagnosis would be lower or higher than in the original trials. The panel thinks that the best estimate of overdiagnosis for a population invited to be screened is of the order of 11%, defined as the percentage excess incidence in the screening population above the long-term expected incidence in the absence of screening.
An alternative definition addresses the answer to the question ‘if I am invited to enter into the screening programme and am given a cancer diagnosis during the screening period, what is the likelihood of overdiagnosis’? The panel views the evidence as suggesting that this probability is of the order of 19%.
4.5 Consequences of overdiagnosis
As previously stated, detection of overdiagnosed cancers turns women into patients, leads to surgery and other treatments that are not therapeutically beneficial for these women and can cause harm, and adversely affects their quality of life. As cancers that would not go on to cause cancer death cannot be individually identified, they are treated according to the current treatment protocols. Figure 3D summarises the management of UK screen-detected cancers, both invasive and non-invasive, in 2010/2011 (NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer Intelligence Unit, 2012).
One cannot, however, assume that the overdiagnosed cancers would be managed in the same proportional way as the generality of screen-detected cancers. That the patient dies before the cancer would have presented clinically, implies that such tumours:
would tend to be more slowly growing, as a more rapidly growing tumour would be more likely to present clinically within a shorter time-frame;
would be relatively small, as larger tumours would be more likely to present symptomatically.
Thus, overdiagnosed cancers would tend to be more likely to be:
DCIS (and the relative excess of DCIS in screen-detected cancers would support this), and possibly more likely to be low/intermediate rather than high grade.
Grade 1 or grade 2 invasive rather than grade 3.
Thus, compared with the diagram, patients with cancers that are overdiagnosed would be:
relatively more likely to have been treated on the DCIS side than the invasive; and as more likely to be low/intermediate grade, less likely to have had radiotherapy;
if invasive, more likely to be managed by WLE and radiotherapy than mastectomy as likely to be small
if an invasive cancer, less likely to have had chemotherapy, as patients having chemotherapy are more likely to have had grade 3 and/or node-positive cancers (NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer Intelligence Unit, 2012);
if an invasive cancer, more likely to have had endocrine therapy, as oestrogen positivity is associated with older age and lower grade invasive cancers.
Evidence in support of this tendency for overdiagnosed cancers to be of potentially better prognosis, and thus given less aggressive therapy can be seen, for example, in the reports of the nature of cancers found in the two arms of randomised screening trials. Table 2D shows such data for the Malmö I trial.
4.6 Ductal carcinoma in situ (DCIS)
There is evidence that breast screening has led to an increase in the identification of DCIS (IARC, 2002). It has been suggested that DCIS is a relatively benign condition that would not cause harm, and therefore diagnosis of DCIS contributes significantly to the magnitude of overdiagnosis.
DCIS is a malignant process that arises from the epithelial tissues of the breast, and consists of neoplastic cells, which do not, however, infiltrate beyond the limiting basement membrane, and thus remain within the ducts where they arose. Classification is based on the morphological features: architectural growth pattern and the cytological characteristics of the malignant cells. It is usually grouped by grade into high, intermediate, or low grade (IARC, 2002). Along with LCIS, it is classified as non-invasive breast cancer, and although the cells have the appearance of malignancy, they do not show invasiveness, so carcinoma in situ is not in itself a life threatening condition. The concern is that at least some have the capacity to progress to invasive malignancy.
DCIS is most commonly detected mammographically as microcalcification. Less commonly, DCIS will present with a symptomatic lump.
Table 2E adapted from ‘The non-invasive breast cancer report’ (National Cancer Intelligence Network, 2011), shows the frequency of non-invasive breast cancer for different age groups and presentations in England for the two years 2006 and 2007.
The majority (about 90%) of non-invasive cancers diagnosed are DCIS. It is apparent that the majority are screen-detected but, nevertheless, 38% were diagnosed symptomatically. Some of the symptomatic tumours may have been detected incidentally when patients presented with a different problem (e.g. microcalcifications found in the contralateral breast when the woman has presented with a benign problem in the one breast). Thus, the detection and management of non-invasive disease is not exclusively a problem of the screening programme. Nevertheless, within the screening age group (age 50–70), the majority (79%) of the DCIS is screen-detected. For 2009–2010, of all screen-detected cancers, about one in five were non-invasive, being a little higher (24%) for the prevalent round and lower (19%) for the incident rounds (The NHS Information Centre). Thus, a mammographic screening programme will detect DCIS. In some cases, (about one in five) (Evans, 2012) investigation of what is radiologically DCIS will lead to the detection of an invasive carcinoma – the larger the area of DCIS, the more likely that there will be a frankly invasive component.
Natural history of DCIS
Before introduction of the screening programme, DCIS was a relatively uncommon tumour. Since it is frequently a marker of associated invasive cancer, it has been investigated and usually excised, and hence it is not possible to know what would have happened if it had been left undisturbed and untreated. Given that the screening programme is diagnosing much more DCIS than presents symptomatically, the relevant questions are:
How common is DCIS?
As above, it represents about 1 in 5 of screen-detected cancers, but only 1 in 20 of all symptomatic cases (National Cancer Intelligence Network, 2011). In reports of small series (IARC, 2002) of women without known breast cancer who underwent postmortems (hospital-based or forensic), invasive cancer was found in about 1% and DCIS in 9%, but there was wide variation in the series, presumably reflecting differences in the women selected and methodologies for examining the breast.
How often does it progress to invasive cancer?
The data from trials of therapy (radiotherapy and/or tamoxifen) after WLE of DCIS shows that both interventions reduce the risk of local relapse (similar to the findings for invasive cancer after WLE). Relevant to the UK screening programme is the UK, Australia, New Zealand (UK/ANZ) trial (Cuzick et al, 2011), in which after WLE of screen-detected DCIS, without any further treatment, relapse in the breast occurred in about 19% of cases, in half of which the relapse was invasive. Progression appears to occur slowly – for example, one series of screen-detected DCIS (Wallis et al, 2012) showed the median time to invasive progression for high-grade DCIS was 76 months, and for low/intermediate grade 131 months.
Is there any way of identifying those cases of DCIS that will or will not progress/relapse as invasive cancer?
DCIS is classified histologically on the basis of excised specimens, and there is currently no certain means of identifying lesions that would not progress. The risk of invasive relapse is higher with high- or intermediate-grade DCIS. Low-grade DCIS seems to pursue a more indolent course, and when invasive relapse occurs it is likely to be a grade-1 tumour. There is ongoing work (Pinder et al, 2010; Reeves et al, 2012) looking at histological and molecular markers to identify those most likely to progress, especially to invasive disease.
Does DCIS affect survival?
The follow-up of patients with DCIS usually shows excellent survival. For example, in the UK/ANZ trial of 1701 women with a median follow-up of 12.7 years, only 179 (11%) had died, of which 39 (2% of all cases) died of breast cancer. Long-term follow-up (NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer Intelligence Unit, 2012) of 1603 cases of screen-detected non-invasive breast cancer (nearly all DCIS) showed a 20-year relative survival of 97.2% (95% CI 93.6–100.6), with 7.2% of the 493 deaths being due to breast cancer. However, these series are of patients who have had the DCIS treated: what is unclear is what the risk of dying of breast cancer would have been had it been left untreated.
The main question is whether DCIS is a marker of malignancy requiring active treatment or a benign condition of no clinical significance. On the one hand, DCIS (particularly high grade) can certainly serve as a marker for invasive cancer – either because it is associated with the presence of invasive disease at the time of detection, or because its presence indicates an increased risk of invasive disease developing subsequently – in about 10% of cases at 10 years after WLE only. On the other hand, autopsy series and screening programmes both demonstrate that DCIS can be found in the breast of middle-aged women at a greater frequency than presents symptomatically.
Part of the explanation is time. Breast cancer has a long natural history and in patients with invasive cancer, the evolution of metastatic spread and ultimate death may take place over decades. If one also considers the progression of DCIS to invasive cancer as part of this process, the evolution is even longer. In other words, the relevant question is not whether DCIS progresses to invasive cancer (it can), but whether it might have progressed to an invasive cancer that causes symptoms within the lifetime of the women concerned. This will depend mainly on the age of the woman, her life expectancy at the point of diagnosis, and perhaps other factors that could affect progression (hormonal exposure, obesity, etc.). Current series do not show a significant impact of DCIS on survival, after treatment, even at 20 years, but increasing survival may mean that for women in their 50s and even 60s, the diagnosis of DCIS may impact on their long-term survival. Long-term data are needed.
Thus, in diagnosing DCIS via a screening programme, there is a balance to be struck between the potential benefits for some women of identifying and treating a pre-invasive cancer, and the risks for others of treating something that would never have affected the woman in her lifetime. It is not simply the case that DCIS represents overdiagnosis, although it undoubtedly is a contribution to the cases of overdiagnosis.
5. Other considerations
Beside the benefit of breast screening for mortality and its harm in terms of overdiagnosis, the panel considered other relevant issues. These include additional harms through invitation, screening, diagnosis, and treatment, as well as women’s perceptions and cost effectiveness. Although the panel has not made a systematic appraisal of evidence in all these areas, being outside its terms of reference (Appendix 1), it has drawn together comments on each of these issues as they should not be neglected when considering the overall impacts of breast screening.
5.2 Harms associated with breast screening
Mammography uses X-rays and thus exposes women to very low doses of ionising radiation that could cause breast cancers. The actual dose of radiation depends on several factors including the number of views of each breast and whether film or digital mammography is used.
The Health Protection Agency (Health Protection Agency 2001) has suggested that the lifetime additional cancer risk for each mammography examination is between 1 in 1 00 000 and 1 in 10 000.
Although these doses are lower than those for which cancer is directly induced (Preston et al, 2002), screening a large population on a regular basis may cause harm. The NHS Breast Screening Programme (2011) in 2006 stated that for every 14 000 women in the age range 50–70 years screened by the NHSBSP three times over a 10-year period, the associated exposure to X-rays will induce about one potentially fatal breast cancer. (NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer Intelligence Unit, 2012). A more recent estimate is that screening women every 3 years from age 47–73 would cause 3–6 cancers per 10 000 women screened (Berrington de Gonzalez, 2011). This risk is incorporated in estimates of the benefit of screening (see section 3). Digital mammography, which uses a lower radiation dose, is increasingly being used in the English screening programme. Therefore, it is likely that the risk of exposure will be reduced.
During the process of mammography, the breast is compressed and flattened in order to create a uniform density, which improves the image and reduces the radiation dose. A substantial proportion of women find this painful and some studies (Nelson et al, 2009; Gøtzsche and Nielsen, 2011) have shown that the pain and discomfort of mammography deters them from attending for further screening (Gøtzsche and Nielsen, 2011).
The assessment process
Many women take part in the screening programme; it is often argued that for many the benefit will be reassurance (Welch et al, 2011). With that reassurance, however, must come the knowledge that all screening tests have errors of false positives and false negatives. The mammogram may sometimes appear to show an abnormality that requires further investigation to determine whether or not it is a cancer-requiring treatment or fail to detect a cancer that is present.
In Figure 4 2522 women (i.e., 3105 recalled minus the 583 diagnosed with cancer=2522: 3.36% of all the women screened) were recalled and found not to have cancer. This is called a false-positive result. Of the women recalled and found not to have cancer, the majority (1744/2522=69%) had only further imaging (mammography, ultrasound) but a minority (778/2522=31%) had a biopsy, which was core biopsy under local anaesthetic in all except 2.3% (57/2522) who had a formal biopsy under general anaesthetic. The latter group represents only 0.076% (57/75 057) of all women screened.
Numerous studies have assessed the psychological impact of a false-positive result on women (Brett et al, 1998; Brett and Austoker, 2001). The studies’ results are conflicting but a recent systematic review of the literature (Bond et al, 2012) concluded that, in the population at general risk of breast cancer, a false-positive result can cause breast cancer-specific psychological distress, which may endure for up to 3 years. The degree of distress is associated with the level of invasiveness of subsequent assessment. Some studies found that the distress caused by a false-positive result deterred some women from re-attending for breast screening, which would reduce any benefit they would otherwise have got from being offered screening in the first place. The level of distress can be mitigated by providing women with clearly worded information about the recall and appropriate support from clinical staff in before and during assessment (Bond et al, 2012).
No screening test is completely accurate and sometimes mammography will not detect a cancer. This may because the cancer is not mammographically visible or develops between screening rounds and women are warned of this possibility in the screening literature. When women present with an interval cancer, the previous mammograms are reviewed blind to assess whether a suspicious abnormality was visible on the previous screening mammogram. If so, such cases are classified as a true false-negative mammogram, that is, the suspicious abnormality was missed at the first screen. For women attending at three yearly intervals, the false-negative rate is 0.2/1000 women screened (Lawrence, 2012; c.f. the cancer detection rate by screening of 7.8 cancers/1000 women screened).
Core biopsy carries a risk of local haemorrhage and, rarely, reaction to local anaesthetics. Open surgical biopsy involves a general anaesthetic but it is regarded as a low-risk procedure.
Psychological consequences of a positive diagnosis
The psychological consequences of a breast cancer diagnosis and subsequent treatment have been well documented. In terms of harms of screening, these consequences are particularly relevant to those women who have been overdiagnosed. Although these women will not know that the cancer would not have caused them any harm they will have suffered unnecessary psychological trauma associated with a cancer diagnosis
Two studies (Yousaf et al, 2005; Schairer et al, 2006) have shown a small but significant increased risk of suicide in patients diagnosed with breast cancer. The risk increases with advancing stage of the disease and therefore may be less relevant for those who are overdiagnosed. However, two further studies (Jamison et al, 1978; de Leo et al, 1991) have found suicidal ideation to be present in some patients post-mastectomy. Although these risks are small they should not be overlooked when assessing the benefits and harms of breast screening.
Potential morbidity and mortality from treatment
As with any surgical procedure, there are hazards from the anaesthetic and the surgical procedure itself. Although the surgery can be extensive (especially if it involves reconstructive surgery as well), the surgery is elective, patients are assessed pre-operatively, serious complications are rare. The most extensive surgery is mastectomy and reconstruction for which the mortality is estimated to be <0.3% (The NHS Information Centre). In contrast, following mastectomy, 10% of patients will have some sort of complication (e.g. infection, fluid accumulation) (The NHS Information Centre).
Acutely, radiotherapy can cause skin reactions and uncommonly radiation pneumonitis. Both of these are short-lived and usually not severe.
Radiotherapy can cause other long-term harms (Early Breast Cancer Trialists’ Collaborative Group (EBCTCG), 2005). There is, at 15 years, a small excess risk of non-breast cancer mortality (15.9 vs 14.6%, an absolute difference of 1.3%). This is mainly due to heart disease (so seen more in left- than right-sided cases because more of the heart is irradiated), lung, and oesophageal cancers. These estimates are derived from trials of radiotherapy performed mostly during or before the 1970s; since then radiotherapy techniques have changed especially with the introduction of CT planning, so reducing the volume of heart and lung irradiated, which should reduce, but not eliminate, such complications. Data from the Surveillance Epidemiology and End Results (SEER) database (Giordano et al, 2005) shows that the risk of death from ischaemic heart disease due to radiotherapy has diminished from 1973 to 1989 (risk from right-sided tumours unchanged, left sided decreased).
The last published Oxford overview (Clarke et al, 2005) showed that there is a reduction in mortality from the reduction in local recurrence of invasive cancer by radiotherapy. Essentially, for every four recurrences prevented at 5 years, there will be one death prevented at 15 years. For illustration, the local recurrence rate in the radiotherapy START trial (in which many patients had screen-detected cancers) was 3.5% at 5 years, which, given radiotherapy reduces local recurrence by about two-thirds, would correspond to a 5-year local recurrence rate of about 10.5% without radiotherapy. This gain of 7% in local control should correspond to a reduction in mortality of just under 2%.
Adjuvant hormone therapy
The most extensive experience is with tamoxifen. Trials of adjuvant tamoxifen for 5 years have shown that for patients with hormone receptor-positive breast cancer, breast cancer mortality is reduced by about 33% (Early Breast Cancer Trialists’ Collaborative Group (EBCTCG) 2005), translating into an absolute reduction in mortality at 10 years of 5.3% and 12.2% for node-negative and node-positive patients, respectively. Tamoxifen does have some long-term hazards in that it carries an increased risk of uterine cancer and thromboembolic disease. Their effect on mortality is of the order of 0.2% per decade and is outweighed by the modest but positive effect of tamoxifen on ischaemic heart disease (possibly because it lowers cholesterol) (Dewar et al, 1992). Aromatase inhibitors are increasingly used instead of tamoxifen, but their overall effect on mortality is very similar to that of tamoxifen.
Adjuvant cytotoxic chemotherapy reduces both overall and breast cancer-specific mortality. Use of an anthracycline- or taxane-containing regime yields a RR reduction of about one third in breast cancer mortality (Peto et al, 2012). The absolute benefit depends on the risk profile but will often be of the order of 6–7% at 10 years. There are acute toxicities associated with giving chemotherapy — such as alopecia, nausea and vomiting, which are all unpleasant but non-fatal. Acute neutropenic sepsis can be fatal but this is a rare event in the adjuvant setting. There is an increased risk of thromboembolism. Mortality rates during adjuvant chemotherapy have been reported at around 0.3% (Cameron et al, 2003). The main long-term risks are (Azim et al, 2011):
Cardiac: Anthracyclines can cause a cardiomyopathy, the incidence being dose related and increasing with age. Trials suggest an absolute excess mortality of up to 1%, but this may be an underestimate as the incidence of cardiac failure may be higher and can occur many years after treatment.
Second cancers: The main risk with chemotherapy, particularly anthracycline-based, appears to be acute myeloid leukaemia and myelodysplastic syndrome. At standard doses, the risk is probably of the order of 0.5% but may be higher if the doses (especially of alkylating agents and anthracyclines) are increased.
Neurotoxicity and premature menopause: Both are very real causes of morbidity but not of mortality.
We know that within the NHS screening programmes, of patients found to have invasive or non-invasive cancer, 99% have surgery (of whom 5.7% have mastectomy and immediate reconstruction), 72% have radiotherapy, 72% have adjuvant hormone therapy, and 27% adjuvant chemotherapy (NHS Breast Screening Programme & Association of Breast Surgery-West Midlands Cancer Intelligence Unit, 2012). From the above, assuming a worst case scenario, it would be reasonable to assume no adverse mortality effect for hormone therapy, no net effect of radiotherapy on mortality, a maximum of 0.2 per 1000 dying because of surgery (0.3% of those having reconstruction) and 1.3 per 1000 dying because of chemotherapy (0.5% of the 27% who have chemotherapy), giving an adverse mortality rate of 0.15%. For patients who have an ‘overdiagnosed’ cancer, the risk is likely to be lower as it is unlikely that they would have received chemotherapy (see section 4).
The panel concludes that the excess mortality from the investigation and treatment of invasive breast cancer is small and outweighed by the benefits of the treatment.
For DCIS, the benefits of radiotherapy or hormone therapy are in terms of recurrence rather than a reduction in mortality, but the absolute risks of such treatment in terms of mortality are likely to be very small. For patients with screen-detected breast cancer, there is no evidence that these risks are any greater than in the symptomatic population, but for women diagnosed with a breast cancer, that if it were certain would never be symptomatic, there is nevertheless a real, but very small, mortality risk from being screened.
5.3 Women’s perceptions of screening
The development of new information to accompany cancer screening invitations was not in scope for this review and is being dealt with separately. Women’s perspectives on overdiagnosis and whether they see it as a key issue in their screening decisions had not previously been investigated, so Cancer Research UK commissioned some qualitative research to investigate this. The findings, from one focus group attended by panel members, are presented briefly here for information (Appendix 5), but academic papers, focusing on a larger sample of qualitative research, will follow publication of this report.
These women understood the concept of screening and most had attended. Although they understood breast cancer, and many knew people who had had it, they had little concept of DCIS and overdiagnosis. Their opinions are not mainly informed by the screening leaflet, and it would appear many do not read it in detail. Thus, informing women about screening will involve much more than simply re-writing the leaflet.
5.4 Cost-effectiveness of breast screening
It was not in the panel’s remit to review the data relating to the costs or the cost-effectiveness of breast cancer screening. The Department of Health in England has provided funds of about £100 million per year to deliver the current screening programme (NHS Breast Screening Programme, 2012).
If one were to take the well-founded cost-effectiveness approach such as that employed by the National Institute for Health and Clinical Excellence (NICE) when reviewing a health technology, it would be important to establish the costs not only of the intervention, but of all subsequent interventions, both in those invited to be screened and those not offered screening. No such data are available for any of the randomised trials, and thus this panel is not in a position to consider the full costs of a breast screening programme, including the financial costs to the NHS of any overdiagnosed cancers.
Thus, although it has been estimated that the UK NHSBSP comes within the NICE cost/quality-adjusted life year threshold of £20 000–30 000 (Advisory Committee on Breast Cancer Screening, 2006), the panel is not able to comment on this, as it has not been able to scrutinise the costs of treatment with and without screening, including the costs of treating the cancers that are overdiagnosed.
We can, however, make general comparisons with other interventions and see that, in terms of lives saved per year, breast cancer screening is of a similar order of magnitude as cervical screening, bowel cancer screening using faecal occult blood testing and, the use of statins (Table 3).
6. Conclusions and recommendations
6.1 Recommendations for further research
The panel’s review of the randomised trials of breast screening leads to the following recommendations about future research priorities:
An individual participant data meta-analysis of the breast screening trials is in progress. This should help resolve some (but not all) of the concerns that have been raised about individual trials and their combined interpretation. The panel supports this enterprise, and is disappointed that it had already not been done a long time ago.
The impact of breast screening outside the ages 50–69 years is very uncertain. The panel supports the principle of the ongoing trial in the United Kingdom for randomising women under age 50 and above age 70 to be invited for breast screening.
The panel’s review of overdiagnosis leads to their support for further research into DCIS, in particular:
A proposed study to examine the need for treatment of low-grade DCIS
Continued support for the Sloane project, which has an extensive database of screen-detected cases of DCIS, and the long-term follow-up of these cases may well improve our understanding of this condition (The Sloane Project 2010).
Current mammographic screening techniques now detect many more cases of DCIS than in the trials. The appropriate treatment of these is uncertain, because there is limited information on their natural history (section 4.6). The panel supports studies to elucidate the appropriate treatment of screen-detected DCIS.
Work on improved screening and pathological techniques that can predict prognosis more effectively.
The panel also supports:
A re-evaluation of the cost-effectiveness of the NHS breast cancer screening programme that takes into account the conclusion of this report.
Breast screening extends lives. The panel’s review of the evidence on benefit – the older RCTs, and those more recent observational studies judged to be relevant – point to a 20% reduction in mortality in women invited to screening. A great deal of uncertainty surrounds this estimate but it represents the panel’s overview of the evidence. This corresponds to one breast cancer death averted for every 235 women invited to screening, and one death averted for every 180 women who attend screening.
The breast screening programmes in the United Kingdom, inviting women aged 50–70 every 3 years, probably prevent about 1300 breast cancer deaths a year, equivalent to about 22 000 years of life being saved; a most welcome benefit to women and to the public health.
But there is a cost to women’s well-being. In addition to extending lives by early detection and treatment, mammographic screening detects cancers, proven to be cancers by pathological testing, that would not have come to clinical attention in the woman’s life were it not for screening - called overdiagnosis. The consequence of overdiagnosis is that women have their cancer treated by surgery, and in many cases radiotherapy and medication, but neither the woman nor her doctor can know whether this particular cancer would be one that would have become apparent without screening and could possibly lead to death, or one that would have remained undetected for the rest of the woman’s life.
The answer the panel sought was to the question of the level of overdiagnosis in women screened for 20 years and followed to the end of their lives. Estimates abound of overdiagnosis, from near to zero to 50%, but there are no reliable data to answer this question. There has not even been agreement on how to measure it. On the basis of follow-up of three RCTs, the panel estimated that in women invited to screening, about 11% of the cancers diagnosed in their lifetime constitute overdiagnosis, and about 19% of the cancers diagnosed during the period that women are actually in the screening programme. However, the panel emphasises, these figures are the best estimates from a paucity of reliable data. Any excess mortality stemming from investigation and treatment of breast cancer is considered by the panel to be minimal and considerably outweighed by the benefits of treatment.
Putting together benefit and overdiagnosis from the above figures, the panel estimates that for 10 000 UK women invited to screening from age 50 for 20 years, about 681 cancers will be found of which 129 will represent overdiagnosis, and 43 deaths from breast cancer will be prevented. In round terms, therefore, for each breast cancer death prevented about three overdiagnosed cases will be identified and treated. Of the ∼307 000 women aged 50–52 who are invited to screening each year, just over 1% would have an overdiagnosed cancer during the next 20 years. Given the uncertainties around the estimates, the figures quoted give a spurious impression of accuracy.
6.3 Policy recommendations
The panel concludes that the UK breast screening programmes confer significant benefit and should continue. The greater the proportion of women who accept the invitation to be screened, the greater is the benefit to population health in terms of reduction in mortality from breast cancer. However, for each woman the choice is clear: on the plus side, screening confers reduction in the risk of mortality from breast cancer because of early detection and treatment. On the negative side, is the knowledge that she has perhaps a 1% chance of having a cancer diagnosed and treated that would never have caused problems had she not been screened.
Evidence from a focus group the panel conducted, and in line with previous similar studies, was that screening was an offer many women will feel is worth accepting: the treatment of overdiagnosed cancer may cause suffering and anxiety but that suffering is worth the gain from the potential reduction in breast cancer mortality. Clear communication of these harms and benefits to women is of utmost importance and goes to the heart of how a modern health system should function. There is a body of knowledge on how women want information presented, and this should inform the design of information to the public.
Appendix 1. Terms of Reference, review process and role of the secretariat
Terms of Reference for the breast screening review
The overall aim of the review is to develop an up-to-date (2012) assessment of both the benefits and harms associated with population breast screening programmes. This is a rigorous review of the evidence by an independent panel; it is not a formal systematic review.
The review has been commissioned by Professor Sir Mike Richards, National Cancer Director, England, and Dr Harpal Kumar, Chief Executive of Cancer Research UK.
Up to six independent experts will be appointed to undertake the review. These experts will be nationally and internationally recognised for their expertise in epidemiology and/or medical statistics as well as in current breast cancer diagnosis and treatment, but will not have previously published on the topic of breast screening.
The reviewers will be supported in their work by a small team based at Cancer Research UK who will assist with the collation of relevant research papers and will facilitate the work of the panel. Additional funding for the review will be provided from the Department of Health.
The reviewers will be asked to consider both the evidence from RCTs of breast screening and from observational studies, including prospective follow-up and case–control studies, of the impact of breast screening programmes both in the United Kingodm and elsewhere.
Evidence for the review will be limited to research that has been published or accepted for publication.
The most recent publications from each source (e.g., RCT or cohort) should be considered, but the reviewers may also choose to consider earlier publications from the same source.
In addition to considering individual studies, the reviewers will be asked to consider published systematic reviews and/or meta-analyses of breast screening in different countries/jurisdictions. The reviewers will also be asked to consider published methodology papers to assess benefits and harms in breast screening studies.
The reviewers will be expected to understand the arguments that have been made in various articles and opinion pieces regarding the benefits and harms of breast screening. The focus, however, will be on the evidence, thus articles providing opinions (either for or against breast screening) that do not contain original data or meta-analyses will not be expected to have a bearing on the conclusions of the review.
The key outputs of the review will be:
An estimate of the likely benefits of breast screening and the range of uncertainty in this estimate.
An estimate of the likely harms of breast screening and in particular the risks of overdiagnosis and the range of uncertainty in this estimate (i.e., patients being diagnosed and treated for cancer that would not have caused problems during their lifetime).
The review will comment on overall effectiveness of screening which, in addition to the above, will depend on participation rates and developments in effective treatment.
If the available evidence permits, assessments of benefits and harms should also be made for different age groups and for different subgroups (e.g., DCIS diagnoses, and socioeconomic, and ethnic groups).
Approach to the review
It will be for the independent reviewers to determine how they wish to conduct the review. In addition to reviewing the published evidence, it is likely they will choose to consult a range of experts in breast screening who have published in this field. These experts could include epidemiologists, statisticians or clinicians. Consultation may be via written communications, interviews, or workshops.
Regular updates on the process of the review will be made available by Cancer Research UK (through a dedicated page on the Cancer Research UK website (http://www.cruk.org.uk/breastscreeningreview).
The reviewers will be asked to prepare a report for publication by Cancer Research UK. The outputs of the report may also be published in peer-reviewed journals. On completion, the report will be shared with the UK National Screening Committee (NSC) and with Ministers in England, but the NSC will have no input to the content of the report.
It is expected that the initial report will be published by spring/summer 2012. However, it is conceivable that, at that stage, the over-riding view of the independent review group is that further primary research is needed to develop definitive conclusions. If this is the case, the review will make recommendations regarding the balance of evidence as it currently stands.
It is further expected that the review will make recommendations on key messages regarding risks and uncertainty that need to be considered when drafting new communications materials regarding the breast screening programme. This will include considerations of effectiveness.
As outlined in section 2, the panel called on a range of experts to give evidence. The expert witnesses who have presented evidence to the panel and debated points relevant to the review are:
Philippe Autier, Vice President, Population Research, International Prevention Research Institute (iPRI), Lyon, France
Michael Baum, current Director of the Clinical Trials Group at University College London,Professor Emeritus of surgery and visiting Professor of medical humanities, University College London
Dame Valerie Beral, Professor of Epidemiology and Director, Cancer Epidemiology Unit, University of Oxford
Susan Bewley, Consultant Obstetrician and Honorary Senior Lecturer at King’s College London
Stephen Duffy, Professor of Cancer Screening, Wolfson Institute of Preventative Medicine, at Barts and the London School of Medicine and Dentistry, part of Queen Mary University London, UK.
Harry de Koning, Professor of Screening Evaluation, Erasmus MC, Rotterdam, the Netherlands
Ian Ellis, Professor of Cancer Pathology, University of Nottingham
Peter Gøtzsche, Director, Nordic Cochrane Centre, Copenhagen, Denmark
Klim McPherson, Emeritus Fellow, Visiting Professor of Public Health Epidemiology, Oxford University
Albert Mulley, Director, The Dartmouth Centre for Health Care Delivery Science and Professor of Medicine, Dartmouth Medical School, Dartmouth, USA
Lennarth Nyström, Associate Professor, Department of Public Health and Clinical Medicine, Umea University, Sweden
Julietta Patnick, Director, NHS Cancer Screening Programmes and Visiting Professor, University of Oxford
Sir Richard Peto, Professor of Medical Statistics & Epidemiology, Co-director of the Clinical Trial Service Unit, University of Oxford
Paul Pharoah, Professor of Cancer Epidemiology, University of Cambridge
Sir Nick Wald, Institute Director, Wolfson Institute of Preventive Medicine, Barts and the London Medical School
Jane Wardle, Professor in Clinical Psychology and Director, Health Behaviour Unit, University College London
Robin Wilson, Consultant Radiologist, The Royal Marsden, London
These expert witnesses also suggested additional scientific evidence for consideration by the panel and provided follow-up information on their evidence, if requested by the panel. The secretariat organised and attended each witness session but did not participate in any discussions. The commissioners of the independent review, Professor Sir Mike Richards and Dr Harpal Kumar attended some of these sessions but only as observers; they did not participate in any discussions or pose any questions to either the panel or the expert witnesses.
Role of the secretariat
Cancer Research UK and Department of Health provided the secretariat, acting purely as support to the panel in the practical, writing, and dissemination functions, and having no say in the conclusions or recommendations.
In addition, the secretariat collated a bibliography of all scientific research papers and reports that had been brought to the panel’s attention by experts from both sides of the screening debate. The secretariat also provided additional specific research papers that the panel wished to consider. In addition to providing the modelling study, showing the impact the various assumptions used to calculate the level of ‘overdiagnosis’ can have on these estimates (Appendix 6), Nick Ormiston-Smith provided cancer incidence, mortality, and survival statistics and ran statistical analyses as requested and instructed by panel members. The secretariat also organised a focus group with women of screening age in collaboration with the Cancer Research UK Health Behaviour Research Centre at University College London, as requested by the panel.
Appendix 2. Changes in breast cancer management and mortality
Since the late 1980s, there have been three main changes in breast cancer management:
• Organisation of services
• Population screening
Surgery: There has been a shift from mastectomy to breast conservation (lumpectomy and radiotherapy), formal staging of the axilla, latterly by sentinel node biopsy.
Radiotherapy: Trials have established the role of radiotherapy, following lumpectomy and, for selected patients, following mastectomy.
Adjuvant systemic therapy: Trials have established that for patients with oestrogen receptor (ER)-positive invasive breast cancer, tamoxifen (or for postmenopausal patients, aromatase inhibitors) reduce the risk of relapse and improve long-term survival. Adjuvant chemotherapy was initially introduced for high-risk premenopausal patients, using the CMF regime, then, as its benefits were appreciated, postmenopausal and lower-risk patients, were also treated and anthracycline- and/or taxane-containing regimens were also used (with further benefit). More recently, for the minority of women with HER2-positive breast cancer treated with chemotherapy, trials have confirmed that the addition of trastuzumab further improves survival.
Organisation of services
The management of breast cancer in the United Kingdom was considered part of general surgery, pathology, radiology, and oncology. There has been shift (in part due to the setting up of specialist screening services) to all breast cancer patients being seen in specialist units and decisions about management being considered at specialist multidisciplinary team meetings. This was an incremental process, and the improvements in, for example, surgical staging have helped better targeting of treatment (e.g., knowledge of the nodal status assists selecting patients for chemotherapy, knowing the ER status selects patients for adjuvant hormone therapy).
Most of the issues about screening are discussed elsewhere in this report.
Changes in mortality
The graph (Figure A2) shows the changes in breast cancer mortality for different age groups over a 40-year period.
The increases in mortality seen in the early period (1971 to mid 1980s) presumably follow on from the increase in incidence. The pattern by age follows that of the incidence changes with the peak incidence in older patients and little change in the under 50s. This is what one would expect if there was little change in treatment (as was the case).
The fall in breast cancer mortality starts in the late 1980s, affects all age groups, and has continued to 2007. The figures in the table below examine these changes in more detail (GROS 2010, ONS 2011, NISRA 2011). Deaths from breast cancer can be expressed as standardised mortality rates, which are very useful for comparing populations, but also as absolute numbers of deaths, reflecting the realities in terms of the burden on the health service.
The figures confirm a reduction in both the breast cancer mortality and the absolute number of deaths from breast cancer in all age groups. It should, however, be noted that:
the reduction in breast cancer mortality is most marked in the under 70s.
breast cancer mortality rises markedly with age, but the relative contribution of breast cancer deaths to the total number of deaths falls with age (reflecting the increase in other causes of death with age).
accompanying the fall in breast cancer mortality and deaths, there has been a marked fall in other causes of death, particularly in the 60–79 age group.
the net effect is that the relative contribution of breast cancer to total deaths has fallen in those <59 years but has (modestly) risen in those aged 60.
The screening programme would be expected to impact only on the deaths of women 55 years, and it is apparent from the figures that the overall effect on mortality will be attenuated by the impact of other causes of death. (e.g., in the 70–79 age group, breast cancer accounts for <5% of all female deaths).
It should also be noted that breast cancer mortality rates vary by country. Breast cancer mortality in the United Kingdom in the late 1980s was about 40 (per 100 000 population), whereas in Sweden (where many of the screening trials were carried out) and Norway it was about 26. The undoubted improvements in the United Kingdom mortality have only brought it down to the starting level in Sweden/Norway (where it was down to 22 by 2006). Comparisons of mortality need to include both absolute levels and changes and understanding of the reasons for baseline differences.
Many of the early trials of adjuvant therapy occurred concurrently with the screening trials, and so the changes mentioned above were introduced concurrently with the introduction of screening. It is thus difficult to disaggregate the individual contributions of each to the undoubted improvements in breast cancer mortality. The organisational changes are not the product of RCTs and are multifaceted, so their contribution, although probably real (Kesson et al, 2012), is the most difficult to quantify. There has been a significant improvement in breast cancer mortality in all ages, part of which is certainly due to improvements in treatment. Changes in mortality also reflect factors affecting incidence as well as presentation and organisational arrangements. Crude mortality statistics are the summation if these factors but do not of themselves indicate the relative contributions.
Appendix 3. Case–control studies
Case–control study selection
General medical literature was searched using PubMed for the period 1970 to present in order to identify case–control studies that assessed the effect of screening mammography on breast cancer mortality. The following search terms were used in locating the articles, ‘breast cancer’, ‘screening’, ‘screening mammography’, ‘breast cancer mortality’, ‘breast cancer death’, ‘screening case–control’ and ‘screening case referent’. A total of 21 case–control studies (see Table A3) were identified as considering breast cancer mortality and compared against a list of case–control references shared by expert witnesses. The case–control studies showed breast screening to confer a greater benefit than did the trials. Although these studies, in general, attempted to control for non-comparability of screened and unscreened women, the panel was concerned that residual bias could inflate the estimate of benefit. There were also a larger number of case–control studies investigating comparisons between lifestyle risk factors and detection of abnormalities at screening, and early- and late-stage cancers. These latter studies were not used, as they did not directly provide an estimate of screening benefit.
Appendix 4. Formulae used to calculate overdiagnosis
Formulae for calculating overdiagnosis from de Gelder et al (2011):
(E−D)/T0, age 0–100 years: the relative increase in breast cancers due to overdiagnosis (E−D) compared with the predicted number of breast cancers in the female population aged 0–100 years in a situation without screening.
(E−D)/T0, screening age and older: the relative increase in breast cancers due to overdiagnosis (E−D) compared with the predicted number of breast cancers in women of the screening age and older in a situation without screening.
(E−D)/T0, screening age: the relative increase in breast cancers due to overdiagnosis compared with the predicted number of breast cancers in women of the screening age in a situation without screening.
(E−D)/T1, screening age: the fraction of overdiagnosed cancers of all diagnosed breast cancers in women of the screening age in a situation with screening.
(E−D)/SD: the fraction of all screen-detected (SD) cancers that is overdiagnosed.
T1, screening age/T0, screening age: the RR of breast cancer for women of the screening age in a situation with screening compared with the predicted number of breast cancers in women of the same age in a situation without screening. The estimator can be corrected for lead time for instance, by shifting the predicted incidence without screening forward in time.
T1, screening age/(T1, screening age, corrected): the RR of breast cancer for women of the screening age in a situation with screening compared with the predicted number of tumors in a situation with screening if no overdiagnosis would take place (T1, screening age, corrected).
D: number of deficit breast cancers in the age groups exceeding the screening limit, calculated as the difference in the number of breast cancers without and with screening; DCIS: ductal cacinoma in situ; E: number of excess breast cancers in the screening ages, calculated as the difference in the number of breast cancers with and without screening; SD: number of screen-detected cancers; T0: predicted number of breast cancers in the absence of screening; T1: modelled total number of breast cancers in the presence of screening; T1, corr: total number of breast cancers in the presence of screening minus the number of overdiagnosed cancers.
Appendix 5. Focus group
Nine women from the London area and within the breast screening age range (50–71 years) were invited to join a focus group to discuss their reasons for attending screening or not, and to comment on information on the risk of overdiagnosis and DCIS. All women spoke fluent English, had no previous personal history of cancer and came from a range of socioeconomic backgrounds. These women were recruited from a market research recruitment database, hosted by Saros (http://www.saros-research-recruitment.com). Saros screened eligible participants via email or phone. The group was facilitated by Dr Jo Waller from the Cancer Research UK Health Behaviour Research Centre at University College London and observed by two panel members.
Accepting or declining an invitation to screen: Seven out of the nine attendees had accepted an invitation to be screened in the past, which is close to the current UK average. The main reasons expressed for attending were an assumed feeling that attending screening is beneficial (the perceived benefits are that finding the disease earlier means better outcomes and that a negative scan provides peace of mind), awareness of breast cancer (‘lots of people are getting it’) and the fact that you receive a specific invitation to attend. The main reason mentioned for not attending was the anticipated pain of the actual mammogram and the embarrassment of the technique. Some of the women who had previously attended screening had experienced discomfort and mentioned that this might deter them from accepting another invitation in the future. There was a general consensus among the group that no screening programme can be perfect, and that some cancers may be missed but the women had not been particularly aware of DCIS and overdiagnosis. The screening attendees felt that the screening programme was well organised, but most agreed that they would be less likely to attend if not specifically invited.
The knowledge that you may be diagnosed with and treated for a slow growing tumour that would never have caused you problems in your lifetime did not appear to change this group’s intention to accept another screening invitation. There was a general consensus, in this group and others, that attending screening and possible subsequent decisions on treatment if cancer or DCIS are found were two separate issues.
The women were surprised, however, to learn that doctors cannot always tell whether a tumour is likely to cause harm or not, but felt that the treatment decision was one to be made by the woman after discussions with their consultant. There was a feeling that doctors would not recommend treatment for cancer if they did not think it was appropriate. This is in line with findings from other qualitative research (Dr Jo Waller, personal communication).
There was more concern about the potential radiation risk: ‘For every 14 000 women screened regularly for 10 years, one woman may develop breast cancer she will die from because of the radiation from the mammograms’ (NHS Breast Screening leaflet, 2011) and the high number of women recalled for further tests after the initial mammogram (about one in every 20 screened), rather than the concept of overdiagnosis.
Information about screening: Some of the women in this focus group expected the information in the breast screening leaflet to boost uptake of screening invitations, and felt that it should therefore be written in non-alarmist terms. Many could not recall whether they had read the leaflet when they were last invited and nobody could remember what information it actually contained. The women indicated that their decision to accept or decline an invitation to screen was unlikely to be influenced by information in this leaflet. This group also expected the leaflet to focus on what to expect when attending for a screen, notably the procedure of the mammogram. However, they also felt that some basic information about risks and benefits should be included for those women who wanted it.
Appendix 6. Modelling overdiagnosis using time trends
The most reliable estimates of overdiagnosis come from three RCTs in which women in the control group were not offered screening at the end of the trial. However, these randomised trials of breast screening date from the 1980s or earlier. The setting of the trials is not necessarily directly comparable with the current screening programme (for example, because of different technology used in mammography and changes in the underlying breast cancer risk). Many researchers have therefore attempted to use observational studies to estimate the extent of overdiagnosis contemporaneously.
One method of estimating the level of overdiagnosis is the extrapolation method (ONS, 2012). This method predicts the expected level of breast cancer if there were to be no mammography screening, and compares it with the actual observed level.
Estimates produced using this basic extrapolation method differ for a number of reasons, but one reason for different estimates is the choice of assumptions used in the modelling (ONS, 2012). A number of models were therefore run to consider the effect of applying different assumptions and the impact this has on the estimate of overdiagnosis due to breast screening.
The method predicts the expected level of breast cancer diagnosis in the age group targeted for screening in the absence of screening using a regression model, and then calculates the difference between this expected level and the observed data; the excess due to screening (Figure A6.1). Similar analysis for the older-age group was then undertaken to calculate the size of a compensatory drop. Compensatory drop is the relative decrease in the incidence of a cancer in a screened population compared with an unscreened population, once screening stops. This is because screening detects the cancer earlier, so cases that would have presented symptomatically have already been diagnosed during screening. The overall estimate of overdiagnosis is therefore the number of excess cases due to screening minus the size of this compensatory drop. This was calculated for each year in the analysis and the average taken so that the results from all the models with different periods could be compared.
This extrapolation method assumes that the risk of breast cancer has increased at a constant rate, as the period used to estimate the expected level of breast cancer ends. In addition, it assumes that the quality of case ascertainment by registries and diagnostic methods has remained stable over time.
In total, 2250 regression models, using both linear and Poisson regression, were applied to the age-specific incidence rates in England from 1975 to 2004 using a combination of different assumptions, and a range of different overdiagnosis estimates were produced. The assumptions examined were:
the pre-screening era period, this varied from 1975–1984 to 1975–1988 (and all intermediate years).
the target-screening age group, this included age groups 50–64, 45–64 and Poisson regression models, which considered two age categories separately 45–49 and 50–64
the post-screening age group, this included age groups 65–74, all women 65 and Poisson regression models, which considered two age categories 65–69 and all women 70.
the screening era period, the start of the screening period was allowed to vary between 1989 and 1993 and the end from 2002 to 2004.
Table A6.1 set outs the different model specifications used in the modelling.
Standard Models: All of the combinations of age groups and time periods, described above, were modelled using both linear regression (300 models) and Poisson regression (675 models). These analyses are known here as the standard regression models.
Calculation of the Compensatory Drop: In addition to the standard models described above, the calculation of the compensatory drop (using either count or rate ratio) was investigated. The results of the standard regression models have been expressed using the difference of counts in the observed and expected data. To investigate the effect of calculating the compensatory drop using a different method the outcome of the linear regression standard models using the difference of counts were compared with the same models, but the compensatory drop was calculated using the rate ratio of the observed and expected data. This analysis was repeated but the rate ratio was only applied to the final year in each set of linear regression results.
Model Adjustment: Finally, an adjustment was applied to the standard Poisson regression model estimates to take account of increasing incidence in the mostly unscreened under 45 age group. These results were compared with the standard Poisson regression model estimates.
The results of the 2250 regression models produced a range of estimates from 0 to 6552 or 214 per 10 000 women invited to screening (Figure A6.2). A total of 10% of the results were <1150 (or 37 per 10 000 women invited to screening) and 10% above 4115 (or 134 per 10 000 women invited to screening). These figures are not directly comparable with the 129 per 10 000 women invited to screening in the main report because they do not include DCIS. Three assumptions had the biggest effect on the results of the modelling: the adjustments made to the regression technique, end of pre-screening era, and target age group.
Standard Models: Choosing a standard Poisson regression model rather than a standard linear regression model increases the estimates of over diagnosis by an average of 316 women per year (range 80–476). The choice of target age group 50–64 rather than the age group 45–64 reduced the estimates of over diagnosis by an average of 379 across all of the regression models. Increasing the length of the pre-screening era from 1984 to 1988 reduced the estimates of overdiagnosis by an average of 1083, using the linear regression models and 1314 in the Poisson regression models. The length of the pre-screening era, which varied from 1975 to 1984 or 1988 (and all years in between) has an impact because there has been an increase in the rate of breast cancer diagnosis in this 5-year period.
The effects of the length of the screening era and the age groups used in the post-target age group had much less of an effect. Starting the screening era at 1991, on average, had the biggest effect and increased the over diagnosis estimate by 123 women per year compared with 1993 (the lowest year) in the standard linear regression model. In the standard Poisson regression model, 1991 is the highest year that was 158 higher than 1993 (the lowest estimate), on average, in the standard Poisson regression model.
Calculation of the compensatory drop: When the rate ratio was applied rather than the difference of counts to calculate the compensatory drop on a set of standard linear regression estimates, the estimates of overdiagnosis were increased by, on average, 295 women per year. If only the last year of the screening period was used rather than the average across all years, then this method increased the estimate overdiagnosis by 1469 women per year on average. This difference was primarily driven by the choice of the pre-screening era (1984 through to 1988), which in the standard linear regression model can differ by up to 1083 women (Table A6.2) but using the rate ratio method and the last year the estimate varied by an average of 2420 women.
Model adjustment: Applying an adjustment to take account of increasing incidence in women under 45 years to the standard Poisson regression model estimates reduced the estimates, on average, by 1161 women. This was also driven by the length of the pre-screening era, but the results were not incremental in the same way as the linear regression results. The models that used 1984 and 1985 as the end of the pre-screening era had very low estimates and the models that used 1986, 1987, and 1988 had relatively larger estimates.
The best method of assessing both the positives and negatives of breast screening would be a randomised control trial. However, in the absence of an RCT and with publically available data, the level of overdiagnosis can be estimated by extrapolation. However, the results are sensitive to the assumptions used to set up the model, and are limited by the age extension roll out between 2002 and 2004.
The decision on how to adjust the regression modelling has the biggest impact on the results. However, the adjustments to the model that best represents the level of breast cancer would be in the absence of screening is unclear.
This extrapolation method assumes that the risk of breast cancer has increased at a constant rate as the period used to estimate the expected level of breast cancer ends. In addition, it assumes that the quality of case ascertainment by registries and diagnostic methods has remained stable over time. Although in theory it would be possible to adjust for these effects, how to adjust for them in practice would create further uncertainty in the estimates produced because different methods would create a further range of possible overdiagnosis estimates.