Diagnosing patients at non-advanced stage is becoming a mainstay of contemporary cancer prevention and control strategies, complementing cancer prevention and screening.1,2,3,4 Most cancer patients are diagnosed after symptom onset, as effective screening tests only exist for few cancer sites and participation in screening programmes is suboptimal.5 Therefore, additional to efforts to optimise participation in screening, public health policies in many countries focus on shortening intervals from symptom onset to diagnosis to achieve population-level reductions in advanced stage cancer.

In 2018, the UK Government has set out a target for 75% of cancer patients with common solid tumours to be diagnosed in tumour, node, metastasis (TNM) stage I or II by 2028.6 However, how improvements in stage distribution could be achieved within a decade remains uncertain. An appealing strategy to help improve the stage distribution of incident cases is the reduction of stage inequalities by socio-demographic groups.

Several demographic and psychosocial factors are associated with the length of time from symptom onset to presentation (i.e. the ‘patient interval’7), and related markers (such as awareness of cancer symptoms or reported psychological or practical barriers to presentation).8,9 Specifically, older and lower socioeconomic status individuals tend to have lower awareness of likely cancer symptoms, higher degree of practical or emotional barriers to presentation and help-seeking and to experience longer intervals to diagnosis.8,9 Socio-demographic differences in stage at diagnosis are therefore likely even in populations served by healthcare systems without financial barriers to accessing care, though such associations may vary between different cancer sites.

Studies from various countries have documented socio-demographic variation in stage at diagnosis but chiefly focussed on the most common, ‘screenable’ cancers.10,11,12,13,14 Previous UK studies also tended to focus on regional, sub-national, populations and earlier eras.15,16,17,18 Motivated by the above considerations, we aimed to examine stage at diagnosis for common cancers in England and related socio-demographic variation in a recent study period with highly complete information on stage at diagnosis. We focussed on the cancer sites that have been used in public reporting of stage at diagnosis in local areas in England and that relate to the current UK government target for improving diagnosis of cancer at an earlier stage by 2028.6



Data were analysed on incident cases aged 30–99 years diagnosed in 2015 with colon [International Classification of Diseases, Tenth Revision: C18], rectal [C19–C20], lung [C33–C34], melanoma [C43], (female) breast [C50], uterine [C54], ovarian [C56], prostate [C61], renal [C64], and bladder [C67] cancer registered in the English population-based cancer registry run by the Public Health England National Cancer Registration and Analysis Service. Beyond cancer site and integrated tumour TNM stage at diagnosis, the main analysis used information on the following variables: age (years), sex, small area deprivation group (fifth of the income domain of the Index of Multiple Deprivation 2015 score of the Lower Layer Super Output Area of the patient’s residence), screening detection status (for breast, colon, and rectal cancers only), and morphology group. Additional (auxiliary) variables were used in the imputation model as described below.


Parameterisation of stage at diagnosis

Consistent with existing reporting conventions in England and prior literature,17,18,19 in the main analysis we categorised stage at diagnosis as advanced/non-advanced using TNM stage III/IV and I/II, respectively. We explored alternative parameterisations in sensitivity analysis (see below).

Imputation of missing stage

Information on stage at diagnosis was 91% complete overall (range: 85% for ovarian to 94% for endometrial cancer). In our analysis, stage was the outcome variable; unlike the case for exposure variables, imputation of outcome variables is generally considered of limited value.20 However, if auxiliary variables are available, multiple imputation reduces bias and increases power compared with complete case analysis.21 Therefore, in the main analysis, we completed information on stage at diagnosis using multiple imputation by chained equations, separately for each cancer site. The imputation models included all variables used in main analysis (i.e. age, sex, deprivation group, screening detection status and morphology, parameterised as in main analysis) and several auxiliary variables, including survival (see Supplementary Information—Text Box). Ten imputations were produced for each site.

Statistical analysis

Initially, we described the number of patients in the data set by socio-demographic characteristic and cancer site and compared the observed and imputed proportions diagnosed at advanced stage by each variable.

We subsequently used three different logistic regression models with robust standard errors, with advanced/non-advanced stage categories being the binary outcome variable. First (Model 1) we described variation in advanced stage across the analysis sample (all ten studied cancer sites) adjusting for age at diagnosis (in years; modelled using a restricted cubic spline with knots at 40, 45, 55, 65, 75 and 85 years), sex, deprivation group and cancer site.

As associations between socio-demographic variables and stage at diagnosis could vary by cancer site, in a second model stratified by cancer site (Model 2) we described socio-demographic differences in stage at diagnosis (by age, sex and deprivation group) for each studied site separately.

For patients with breast, colon and rectal cancers, to acquire insights into the variation in stage at diagnosis that is mediated by socio-demographic differences in the proportion of screen-detected cases, we extended the stratified model also including screening detection status (Model 3).

For five of the ten studied cancers (lung, breast, renal, endometrial, ovarian), there is substantial morphological heterogeneity, and so all models were also adjusted for morphology group for these sites.

In sensitivity analysis, we examined alternative parameterisations of stage i.e. advanced stage defined as TNM II–IV or IV (in addition to III/IV in main analysis), restricting the analysis to those with recorded stage.

Estimating population-level impact

We estimated the population impact that would result from elimination of differences in stage at diagnosis by age (among patients aged ≥65 years), sex (for cancer other than female breast, prostate, ovarian and endometrial) and deprivation group. Specifically, we predicted the number of cases of advanced stage cancer we would expect if:

  • Everyone aged >65 years had the same risk of advanced stage at diagnosis as those aged 65 years if the latter was lower.

  • Men were to attain the same risk of advanced stage as women, or vice versa, as applicable for the sex with the lower risk.

  • More deprived groups had the same risk of advanced stage at diagnosis as the least deprived group, if these groups had higher risk of advanced-stage cancer

  • All above three socio-demographic differences were removed.

We applied the mi predictnl command in Stata to estimate the difference between the modelled probability of diagnosis at advanced stage and the probability of diagnosis at advanced stage under the counterfactual assumptions for each individual patient. We then summed these probabilities to estimate the total number of advanced stage diagnoses associated with each inequality.


Of the 202,001 incident tumours for the studied cancers diagnosed in 2015, 53% were diagnosed at stage I/II (‘non-advanced stage’), 38% at stage III/IV (‘advanced stage’), and 9% had missing stage (Table 1). After multiple imputation, 57.2 and 42.8% of tumours were diagnosed in non-advanced/advanced stage, respectively.

Table 1 Univariate observed and imputed stage distribution by variable category.

Associations with advanced stage considering all the studied cancer sites together (Model 1)

In the unadjusted analysis, there was notable variation in advanced stage at diagnosis by sex (50 vs. 36% in men/women), age (27 vs. 59% in those aged 30–39/90–99 years) and deprivation group (39 vs. 49% in most/least deprived group patients). For lung, breast, ovarian, endometrial and bladder cancer, there were also substantial differences in stage at diagnosis by morphology type (Table 1). There was also large variation in the percentage of patients diagnosed at advanced stage by cancer site, ranging from lung (74%) and ovarian cancer (72%) to breast cancer (16%) and melanoma (9%).

In the adjusted analysis, there was large variation in advanced stage at diagnosis in older age, with the odds increasing exponentially from 70 years upwards (Fig. 1). However, differences by sex and deprivation were relatively small. There was also very large variation by cancer site, which was substantially greater than that by age. For example, there was a 25-fold difference in the odds of advanced stage disease between lung cancer and melanoma (1/0.04), compared with up to 2-fold difference between those aged 95 and 65 years (1.88/1).

Fig. 1: Adjusted odds ratios for diagnosis at advanced stage—all patients with any cancer site in the analysis sample (‘Model 1’ results).
figure 1

Confidence intervals are visualised if they are wider than the symbols used to show point estimates.

Cancer site-specific adjusted associations with advanced stage at diagnosis (Model 2)

Age and stage

Age was strongly associated with risk of advanced stage disease for most cancer sites (p < 0.001, Fig. 2), with a monotonic increase in the odds of advanced stage at diagnosis with increasing age for 5 sites (melanoma, ovarian, prostate, renal and endometrial cancer), and a U-shaped association (both the younger and the older patients having relatively high odds of advanced stage) for the other 5 sites (breast, lung, colon, rectal, bladder cancer). For patients with breast, colon or rectal cancer, adjusting for screening detection status flattened differences by age in stage at diagnosis, particularly for breast cancer.

Fig. 2: Adjusted odds ratios for diagnosis at advanced stage by age (30–99 years) from models stratified by cancer site (‘Model 2’ results).
figure 2

Unless otherwise reported, p < 0.001.

Sex and stage

Across the 6 cancer sites that can occur in either sex, men were at higher risk of advanced stage at diagnosis for melanoma, lung and renal cancer, compared with women; in contrast, women had a higher risk of advanced stage at diagnosis of bladder cancer (Fig. 3). There was no evidence for variation in stage at diagnosis by sex for colon and rectal cancer, with or without adjustment for screening detection status.

Fig. 3: Adjusted odds ratios for diagnosis at advanced stage by sex from models stratified by cancer site (‘Model 2’ results).
figure 3

Where not reported, p < 0.001.

Deprivation group and stage

Increasing deprivation was associated with higher risk of advanced stage, with some variability by cancer (Fig. 4). For 7/10 sites (bladder, breast, colon, rectal, melanoma, ovarian and prostate cancer), more deprived patients were at higher risk of advanced stage at diagnosis. However, for lung, renal and endometrial cancer there was no evidence for differences in risk of advanced stage cancer between the deprivation groups (p > 0.05). For breast, colon and rectal cancer, additionally adjusting for screening detection status made little difference to associations with deprivation group. Values of data visualised in Figs. 14 are included Supplementary Information—Table 1.

Fig. 4: Adjusted odds ratios for diagnosis at advanced stage by income deprivation from models stratified by cancer site (‘Model 2’ results).
figure 4

Deprivation group 1 is the least deprived group. Unless otherwise reported, p < 0.001.

Population impact of removing deprivation and ‘older age’ inequalities in stage at diagnosis

For the studied cancers, removing socio-demographic inequalities in odds of advanced stage diagnosis would decrease the proportion of all incident cases diagnosed at advanced stage by 4.1%, or approximately 8,300 patients each year in England (Fig. 5 and Table 2—columns F and G). This would translate to 61.3% of patients with the 10 studied cancers being diagnosed at stage I/II, from the observed 57.2% (Table 3). Considering the absolute ‘distance’ to the aimed for 75% of all patients being diagnosed in stage I/II by 2028, such potential elimination would reduce the current ‘gap’ of 17.8% (75–57.2%) to 13.7%. And in relative terms, potential elimination would help cover 23.1% of the overall ‘distance to target’.

Fig. 5: Estimated impact of removing ‘older age’ (among individuals ≥65 years), sex and income deprivation inequalities.
figure 5

Impact is shown as the reduction in the number of tumours diagnosed in stage III/IV as a percentage of total diagnoses of each cancer site, and of all sites combined.

Table 2 Estimated impact of removing ‘older age’, sex and income deprivation inequalities, based on models adjusting for morphology and screening where appropriate.
Table 3 Percentage improvement in percentage of patients diagnosed in stage I/II by cancer site.

In Table 2 (top, columns F and G), it can also be seen that potential elimination of inequalities in older age would decrease the proportion of all incident cases diagnosed in advanced stage by 2.1% (or around 4200 cases), whereas the corresponding figures for potential elimination of deprivation inequalities are 1.2% (or around 2300 cases) and for potential elimination of sex inequalities 0.8% (or around 1700 cases).

Considering the impact of potential elimination of socio-demographic inequalities in stage at diagnosis by cancer site, the largest absolute percentage increase in stage I/II was observed for prostate (9.1%) and the lowest for breast cancer (2.1%). Focussing on the two cancer sites with the greatest percentage of advanced stage cases (Table 1), for lung cancer removing socio-demographic inequalities would increase the percentage of cases diagnosed in stage I/II from 26% currently to 28%, while for ovarian cancer the respective percentage would increase from 28% currently to 33% (Table 3).

Within each specific cancer site, the relative the contribution of older age, deprivation and sex varied considerably. For example, for rectal cancer, elimination of deprivation inequalities alone would account for most of the potential reduction, compared to the contribution of eliminating age or sex inequalities. In contrast, for lung cancer potential elimination of sex inequalities would be most important (Table 2—column G). Values of data visualised in Fig. 5 are included Supplementary Information—Table 2. We also highlight the relative impact of potential elimination of inequalities in relation to reduction in advanced stage cases as a percentage of advanced stage diagnoses (Table 2—column H and Table 3, last column).

Sensitivity analysis

Complete case analysis provided smaller estimates of the excess risk of advanced stage cancer associated with older age, sex and deprivation (Supplementary Information—Figs. 13). In complete case analysis, alternative definitions of advanced stage categories gave similar results for sex and deprivation but different associations with age at diagnosis and cancer site (Supplementary Information—Figs. 46).



We provide comprehensive evidence regarding socio-demographic disparities in stage at diagnosis of ten common cancers in a large European country population. Such associations tend to be present, but their strength and direction varies substantially by cancer site. Potential elimination of older age, deprivation and sex inequalities in stage at diagnosis would increase the percentage of patients diagnosed at stage I/II but the increase would fall short of the ‘75% of cases diagnosed in stage I/II’ target, though in proportional terms contributing nearly a quarter of the total improvement needed to attain it.

Comparisons with prior literature

The findings substantially update and expand previous work limited to a single English region (East of England) and relating to an earlier study era (2006–2010).18 The larger (nationwide) sample size of the present study has improved estimate precision such that there is additional evidence of increasing risk of advanced stage at diagnosis in older age for ovarian, renal and lung cancer. Although we would not expect the findings to be necessarily concordant with those from other country populations, in general they are in keeping with literature documenting the presence of a variable degree of socio-demographic variation in stage at diagnosis for few common cancers.10,11,12,13,14,15,16 However, compared with most previous studies, we consider a larger number of common and rarer cancers together, take into account screening detection status and tumour morphology and estimate the population impact of inequalities in stage at diagnosis. Across our analysis sample, cancer site was associated with the largest amount of variation in stage at diagnosis, which would support accounting for cancer site case-mix in summary indicators of cancer stage at diagnosis for geographically defined populations. Regarding lung cancer in particular, we did not observe an association between stage at diagnosis and deprivation; this is consistent with other evidence and a meta-analysis of the global literature on socioeconomic status and stage at diagnosis of lung cancer.22,23,24

Strengths and limitations

A strength of our study is that we were able to adjust for screening detection status, although this adjustment would not take into account any indirect impact of screening on non-screening detected cases. We found that adjustment for screening detection status mediates associations with older age substantially (as can be expected, as screening is targeted to specific age groups) but has a relatively limited impact on differences in stage at diagnosis by deprivation. A further strength is that we have adjusted for tumour morphology differences. We could not adjust for use of prostate-specific antigen (PSA) and some of the socio-demographic variation in advanced stage at diagnosis of prostate cancer may reflect greater use of PSA testing in the least deprived groups.25 As our study was motivated by the public health target for attaining earlier stage diagnosis for 75% of cancer patients by 2028, we did not consider the translation of differences in stage at diagnosis into differences in life expectancy and number of years lost due to inequalities. Eliminating differences in stage at diagnosis between different age groups and cancer sites will translate to variable impact in life years gained (depending on the age and cancer site case-mix of the cancers that will be diagnosed at an earlier stage). These questions should be addressed by future research.

There were inconsistent differences in stage at diagnosis by sex, with women being at greater risk of advanced stage than men for bladder cancer and at lower risk for lung, melanoma and renal cancer, with minimal variation for colon and rectal cancer. These heterogeneous patterns of sex differences in stage at diagnosis point to variable, cancer site-specific aetiologies. For example, prior research indicates that lung tumours tend to grow faster in men than women—consistent with the pattern we observed.26 Social factors may contribute to sex differences in stage at diagnosis for melanoma, such as differences between men and women in bodily awareness and help-seeking behaviour. Lastly, healthcare-related factors such as prolonged intervals from presentation to referral in women may be at least partly responsible for sex differences in stage at diagnosis of bladder cancer.27,28

Stage data were highly complete overall, though, as is the case for studies using nationwide population-based registry data, a small minority of patients had missing stage information. Concordant with best practice in this field, we used multiple imputation to assign stage (using information from several auxiliary variables included in the imputation but not the analysis model), which mitigates this limitation.29,30,31,32 Complete case analysis provided very similar findings.

Consistent with the public policy target that our research was motivated by, we have not considered the potential impact of elimination of variation in stage at diagnosis onto inequalities in survival or life years gained by the patient group.33


The findings suggest the presence of common reasons for differences in stage at diagnosis among older people and between deprivation groups, across cancer sites. These may include psychosocial factors acting as barriers to prompt presentation/help-seeking.34,35 Targeted public health awareness campaigns focussing on specific cancers and/or population groups at higher risk of advanced stage disease would therefore be justified given the findings, particularly for cancer sites with a symptom signature dominated by symptoms with relatively high positive predictive value, for example for melanoma (melanotic skin lesion), breast (breast lump), bladder (haematuria), rectal (rectal bleeding) and endometrial cancer (post-menopausal bleeding).36 The majority of cancer patients with these symptoms have non-advanced stage disease.37

Although deprivation is not associated with variation in stage at diagnosis of lung cancer, given the very strong socioeconomic gradients in incidence, preventive efforts (e.g. through smoking cessation policies) are strongly justified. Increasing participation in bowel cancer screening can also help to achieve a favourable earlier stage shift.

In spite of clear socio-demographic inequalities in stage at diagnosis, their potential reduction would contribute substantially to achieving earlier stage diagnosis targets but will only help ‘bridge’ around a quarter of ‘distance-to-target’. Therefore, public health strategies to improve the distribution of stage at diagnosis of cancer should additionally focus on the whole population, rather than socio-demographic groups at higher risk. Novel diagnostic tests and strategies are additionally needed to enable earlier detection of cancer in both asymptomatic and symptomatic patients of any socio-demographic group.38