Improving cancer survival is a key challenge identified in ‘Improving Outcomes: A Strategy for Cancer’ (Department of Health, 2011). Cancer survival estimates in England currently fall below those in many European countries. If cancer survival in England was comparable to the European average, then 5000 or more deaths within 5 years of diagnosis could be avoided (Abdel-Rahman et al, 2009; Richards, 2009). The observed lower survival in the first year after diagnosis in England can largely be interpreted as evidence of later diagnosis compared with Europe (Thomson and Forman, 2009). Studies comparing England, Norway and Sweden have also identified a higher number of excess deaths in England, predominantly within the first year of diagnosis, which mainly occur in older patients (Holmberg et al, 2010; Møller et al, 2010; Morris et al, 2011). Later, diagnosis can be caused by delays in presentation, primary care delay, delays between primary and secondary care, and secondary care delay (Rubin et al, 2011). The National Awareness and Early Diagnosis Initiative announced in the Cancer Reform Strategy (Department of Health, 2007) aims to coordinate and provide support to activities and research that promote the earlier diagnosis of cancer. Identifying and categorising the routes taken by patients to their cancer diagnoses will reveal any survival differences across different presentation routes and help our understanding of how patients with poor prognosis enter secondary care. This could inform targeted implementation of awareness and early diagnosis initiatives, and enable assessment of their success. Identifying different routes for patients will also enable further specific research to be undertaken on a cancer type by type basis to improve understanding of cancer presentation as well as helping to focus improvements in service delivery for patients with poor prognosis.

Previous studies of routes to diagnosis have mainly focussed on the impact of the Two-Week Wait (TWW) referral system introduced in 2000 (whereby patients being urgently referred for suspected cancer by their GP can expect to be seen by a specialist within 2 weeks). They examined patient cohorts at a single secondary care unit or geographically clustered GP practices (Barrett and Hamilton, 2005, 2008; Blick et al, 2010), or review such studies (Thorne et al, 2009). Overall, previous studies show variation in route to diagnosis by cancer type, but also consistently show a large fraction of cases not following routine, urgent or TWW GP Referral routes.

This study explores the feasibility of using routinely collected data to evaluate how patients resident in England diagnosed with malignant cancers between 2006 and 2008 (739667 tumours) accessed secondary care for cancer diagnosis, and whether these ‘Routes to Diagnosis’ are associated with differences in survival.

Materials and methods

A ‘Route to Diagnosis’ is defined as the sequence of interactions between the patient and the health care system, which lead to a diagnosis of cancer, based on the setting of diagnosis, the pathway and the referral route into secondary care. In many cases, this route begins with a GP consultation. Currently available data limits the portion of the route, which is observable in national data sets, to that within the screening service and secondary care, although this does include the referral method from primary care.

Many routes involve multiple interactions with different parts of the health care system. A large number of individual routes can be defined by combining the setting of diagnosis, the pathway and the referral route, with 71 distinct combinations identified in this cohort. To be useful for analytical purposes, these must be aggregated into a manageable number of broader categories.

Upon examination, two categories were identified, which represent qualitatively different routes (Screen-Detected and Death Certificate Only (DCO)). Three routes reflect the urgency of referral (Emergency, TWW Referral and other GP Referral). Two further routes represent cases for which the route apparently started in secondary care (Inpatient Electives and Other Outpatients) and, finally, one reflects cases with no useful information available on the route to diagnosis (Unknowns). These eight groups are detailed in Table 1.

Table 1 The eight routes used to categorise all tumours

Cancer registration records for all newly diagnosed malignant tumours excluding non-melanoma skin cancer (ICD-10 C00-C97 excluding C44) diagnosed between 2006 and 2008 in residents of England were extracted from the National Cancer Data Repository (NCDR; National Cancer Intelligence Network, 2010). These cancer registration records were linked at patient level to the administrative Inpatient and Outpatient HES data sets from 2003–2004 to 2008–2009 (NHS Information Centre for Health and Social Care, 2011); the National Cancer Waiting Times (CWT) Monitoring data set from November 2005 to January 2009 (Information Standards Board, 2002); National Breast Screening Programme data from 2005 to 2008 (Association of Breast Surgery, 2011); and National Bowel Screening Programme data from 2006 to 2008 (NHS Bowel Cancer Screening Programme, 2011). These data sets were linked using the unique NHS number that is assigned to each patient in England and which is present in nearly every patient level record in each of the data sets (completeness greater than 98.5% in all data sets, except outpatient HES data which could not be directly assessed). The gynaecological screening status recorded within the NCDR provided screening identification for cervical tumours. The NCDR data set was deduplicated using European Network of Cancer Registries (ENCR) criteria (Parkin et al, 1994), removing 7.0% of cases.

The Routes to Diagnosis algorithm first used HES data to categorise the route for each tumour individually. National Screening Programme and CWT data linked by NHS number to the cancer registration record were then examined with the assignment of route potentially changing to either a ‘Screening’ or ‘TWW’ route.

Figure 1 shows the categorisation of each case into a route using HES data. A specific inpatient or outpatient episode was identified in HES as the ‘end point’ of the route by its proximity to the date of diagnosis (defined by standard registration rules using ENCR criteria (Tyczynski et al, 2003)). The end point was assumed to be the clinical care event that led most immediately to diagnosis. Having defined the end point, the algorithm seeks a start point of the route. The start point is determined by working backwards from the end point as shown in Figures 1, 2, 3 and varies both in the care setting and in the length of time before diagnosis. The characteristics of this start point lead to categorisation of route.

Figure 1
figure 1

Flow diagram for allocating the end point of the route using inpatient and outpatient data.

Figure 2
figure 2

Flow diagram for finding the start point or prior step for an inpatient step in a route.

Figure 3
figure 3

Flow diagram for finding the start point or prior step for an outpatient step in a route.

Where both inpatient and outpatient activity occurred on the date of diagnosis, the inpatient episode was defined as the end point of the route. Otherwise, if there was an episode within 28 days before the date of diagnosis, then this was assigned as the end point of the route, with inpatient episodes taking precedence over outpatient episodes, and the most recent episode taking precedence if there were multiple episodes. If there was no HES activity within 28 days of diagnosis, then the most recent episode within 6 months (inpatient or outpatient) was used as the end point of the route. For cases with no HES activity in the 6 months before date of diagnosis, the route was classified as Unknown, or as DCO for cases recorded on the NCDR as being assigned DCO status by cancer registries.

Figure 2 shows the steps taken to seek a start point to the route when the end point was an inpatient admission. The method of admission of the end point was examined to determine the preceding step in the route. Where the method of admission was emergency in nature, this episode was defined as the start point of the route (as well as the end point) and the route was classified as an Emergency Presentation. Where the method of admission was a transfer, the most recent inpatient activity in the 6 months before the admission was examined in an iterative fashion. Where the method of admission indicated a previous outpatient attendance, the most recent outpatient activity in the 6 months before the admission was identified and the source of referral of this outpatient attendance examined as described below, except that if there was no outpatient activity within 6 months before the inpatient admission, the route was classified as Inpatient Elective. Otherwise, the route was classified as Unknown or Inpatient Elective according to the codes listed in Table 1.

Figure 3 shows the examination of the outpatient source of referral when the end point of the route was an outpatient attendance. Where the outpatient attendance was not the first appointment of an outpatient episode, the first appointment in that episode was examined. If the source of referral indicated referral from a previous outpatient episode, then the most recent first outpatient attendance within the previous 6 months was examined. Otherwise, the route was classified as Emergency Presentation, Screen-Detected or Other Outpatient according to the codes in Table 1.

After routes were allocated to each case from the HES data, the screening and CWT data were examined. Where a case could be linked to a CWT urgent referral for suspected cancer, it was categorised as a TWW route, unless the route categorised using the HES data was an Emergency Presentation with an admission date within 28 days before the decision to treat date. Where the case could be linked to a screening event, the route was categorised as Screening. If both were possible, then a Screen-Detected route took priority over a TWW route (route priorities are summarised in Table 1).

A case was linked to a CWT referral where a TWW had a decision to treat date within 62 days before or 31 days after the date of diagnosis. A case was linked to a breast screening event where the breast screening assessment date was within 91 days before or 31 days after the date of diagnosis. For colorectal and cervical screening data, the determination that the case was Screen-Detected had been made by the NHS Bowel Cancer Screening Programme (2011) or the regional cancer registries, respectively, and no matching by date was performed.

The algorithm was written in SQL within a Microsoft Server 2005 database environment. Confidence intervals for proportions were calculated using the Wilson score interval (Newcombe, 1998). Calculation of point estimates and confidence intervals for relative survival was done in the statistical package STATA version 10 using the strel programme and age-sex-region-deprivation-year lifetables (Cancer Research UK Cancer Survival Group, 2006). Cases were excluded from the analysis if sex, date of birth or date of diagnosis were missing, or if the patient was aged over 99 years at the time of diagnosis. No further stratification by age or case censorship was performed.


The proportion diagnosed by route of the 739 667 tumours categorised is shown in Table 2. Most cancers were diagnosed through one of Emergency Presentation (24%), TWW (26%) or GP Referral (21%) with the other five routes making up 29%. These proportions vary considerably with cancer type.

Table 2 Proportion of tumours by route, for selected tumours

The proportion of Emergency Presentations increases with increasing age, whereas TWW and GP Referral routes decrease with age. Unknown routes are highest in the under 50 years age group, whereas DCOs are highest in the 85+ years age group. Screening proportions show the effect of the breast screening age range. The proportion of Emergency Presentations amongst children (age 0–14 years) was 54% for all tumours with low TWW proportions (2% overall, data not shown). The proportions for teenagers and young adults (age 15–24 years) were more reflective of the overall cohort, with 24% presenting as an Emergency Presentation for all tumours and higher rates for some sites (e.g., 57% for colorectal, data not shown).

The proportion of routes changed little over the 3 years 2006–2008 (data not shown). For all cancers combined, the proportion categorised as a TWW route increased from 25 to 27%. The proportion categorised as Emergency Presentation was 24% in 2006 and 2007, and 23% in 2008. The proportion of Screen-Detected colorectal cases increased from 0.1% in 2006 to 5% in 2008, reflecting the staged rollout of the NHS Bowel Cancer Screening Programme (2011).

Comparatively small but statistically significant (P<0.001, z-test for difference in proportions) increases in the proportion of tumours diagnosed via a TWW route were seen between 2006 and 2008 for bladder (5%), oropharynx (10%), larynx (7%), melanoma (6%), prostate (5%) and uterine (4%) cancers.

One-year relative survival estimates are presented by route in Table 3, although survival is not calculated for DCO routes (Parkin and Hakulinen, 1991). Across all cancer types, 1-year relative survival was significantly lower for cases categorised as an Emergency Presentation than for those presenting via other routes. Unknown routes have a comparable survival to other non-Emergency Presentation routes. Where present, the highest relative survival estimates are for Screen-Detected routes.

Table 3 One-year relative survival by route, for selected tumours with 95% confidence intervals

Sensitivity analysis

The algorithm seeks an inpatient episode within 28 days before diagnosis, and then, any HES episode within 6 months before diagnosis as the end point of the route. Eighty-five percent of cases could be assigned a route from HES data. Of these cases, 74% have a start point within a month before the date of diagnosis. This increases to 84% that have a start point within 3 months and 98% within 6 months before the date of diagnosis.

Overall, 94% of routes to diagnosis were derived either from non-HES data (39%) or from HES data with an end point within 28 days of the day of diagnosis (56%), only 4% of routes were diagnosed from HES data with an end point between 28 days and 6 months before the date of diagnosis.

The frequency of (inpatient) admission in the month before diagnosis is 19 times higher than that in the equivalent month a year before diagnosis (across the whole patient cohort). For persons aged 85+ years, this ratio is 14 : 1.

Tumours (6.2%) were diagnosed in patients with more than one invasive cancer, excluding non-melanoma skin cancer (ICD-10 C00-C97 excluding C44), between 2006 and 2008. If these multiple tumours were excluded, then the overall proportion of Emergency routes increased by less than 0.1% and the overall proportion of Unknown routes increased by 0.2%, other route proportions changed by less than 0.5%. The maximum change in all combinations of route and cancer type on including multiple tumours was 1.7% with a mean absolute change of 0.2%.

If Emergency Presentation routes are given the highest priority, followed by TWWs, Screening and others in that order, then Emergency Presentations rise by 0.6%, TWWs drop by 0.4% and Screening drops by 0.2%. If Screen-Detected routes are given the highest priority, followed by TWWs, Emergency Presentations and others in that order, then Emergency Presentations drop by 1.2%, TWWs rise by 1.2% and Screening is unchanged.

Screening data and CWT data are linked specifically to the cancer record rather than to the patient as for HES data. As such, these data are treated as more robust and, therefore, routes generated from them supersede the route generated from the hospital admissions records, with the exception of Emergency Presentations admitted within 28 days before the decision to treat date. There is an impact of less than 0.2% of cases if Screen-Detected routes were not prioritised above Emergency Presentations.

A TWW record was used to categorise the route if the decision to treat date fell within a 3-month period from 31 days before to 62 days after the diagnosis date. A screening record was used to categorise the route if the screening date fell within a 4-month period from 91 days before to 31 days after the diagnosis date. These periods were chosen to correspond to the typical timescales of these patient pathways, and to take account of cancer registration rules, which preferentially define the date of diagnosis from pathological confirmation. Sensitivity analysis showed that the route breakdowns were not greatly affected by changes of a month in the length of the screening date periods; a reduction of 4% in the proportion of TWW routes was observed if the TWW matching period was reduced to 1 month before to 1 month after the diagnosis date. Prioritising all TWW routes above Emergency Presentations reduces the observed proportion of Emergency Presentations by around 1%.


The routes to diagnosis algorithm

A central assumption underlying the algorithm is that it is reasonable to suppose that inpatient and outpatient hospital activity up to 6 months, and in particular in the 28 days before diagnosis, is linked to the diagnosis of the cancer. This activity may not necessarily be directly caused by the cancer itself as diagnosis can result from other clinical investigations, for example, radiological examination for an unrelated condition.

The higher frequency of activity in the month before diagnosis compared with that a year earlier indicates that the majority of hospital activity in the 28 days before diagnosis is related to the diagnosis of cancer. Making the conservative assumption that a ‘background’ event has an equal chance of being picked up by the algorithm as events related to the diagnosis, and the further conservative assumption that they will give an incorrect aggregated route allows us to estimate an overall upper bound of 10% on the error rate of the algorithm due to ‘background’ admissions to hospital. The resultant uncertainty in the proportion of cancers diagnosed via a route with a proportion of 25% of cancers would be approximately 2.5% points overall, and slightly higher for persons aged 85+ years. A small bias toward non-Emergency Presentations might be expected for older persons, because of the fact that the majority of their higher ‘background’ admission rate are elective admissions. Similarly, small systematic effects in specific patient groups with pre-existing comorbid conditions will exist, with the resulting bias depending on the typical nature of an admission for the comorbid condition.

The algorithm does not attempt to match diagnosed cancers to cancer-specific inpatient or outpatient HES records. The majority of outpatient records do not have diagnostic coding, and even where it does exist in outpatient or inpatient records, it is likely that the episodes of interest (being pre-diagnostic) would not include cancer-specific codes. The algorithm only relies on the existence of attendance and episode records, and the associated administrative fields recording source of referral and method of admission, making the calculated routes insensitive to variation in clinical coding.

The methodology presented has several adjustable parameters. The inclusion of multiple tumours did not substantially affect the magnitude of the variation in frequency between routes and they were therefore included in the analysis (Brenner and Hakulinen, 2007; Rosso et al, 2009). Changing the priority of Emergency Presentations with respect to Screening and TWW routes slightly (approximately 1% or less) alters the proportions of cases categorised as each route. The lack of overlap between Emergency Presentation and Screen-Detected routes is reassuring as the majority of Screen-Detected cases are early-stage tumours that are less likely to result in an Emergency Presentation.

Although the methodology allows the assignment of a Route to Diagnosis, it is not intended to identify presenting symptoms of either the cancer or of other illnesses, which may have then led to the cancer diagnosis. Further site-specific research is required to understand the complex nature of what causes patients to follow their Route to Diagnosis for each tumour.

Route frequencies and impact on survival

In every tumour type examined, 1-year relative survival was significantly lower for Emergency Presentations than for any other route. The magnitude of the difference in survival between Emergency and non-Emergency routes is typically in the range of 20–40%. The higher proportion of older people with Emergency Presentations may partly explain this difference in survival. One-year relative survival is lower for the TWW route compared with other non-emergency routes for several cancer sites, including cancers of the central nervous system, liver and lower GI cancers. Given the comparative rapidity of TWW referrals, this could be an example of a waiting time paradox (Crawford et al, 2002). This is consistent with other studies (Tørring et al, 2011) showing that outcomes were worse for the most urgently referred cases.

Cases allocated an Unknown route have a cancer registration (not based solely on a death certificate), but no data can be found in HES in the 6 months before diagnosis, or in CWT or screening sources. The higher proportion of Unknown routes in people under 50 and in the more socio-economically affluent (data not shown) may indicate a higher fraction of private referrals in this group. The effect of age is also seen in the National Audit of Cancer Diagnosis in Primary Care (Rubin et al, 2011), which indicates that private referrals are less common in people over 70. The relatively large proportion (18%) of tumours assigned to the Unknown route for melanoma might be due to tumours being removed in primary care where melanoma was not suspected. The survival of the Unknown routes is comparable to that of other non-emergency routes across all cancer types, suggesting delivery of care via non-emergency settings.

Comparisons with other studies

This study calculates proportions of routes at a population level using nationally defined data sets. When comparing these proportions with previous studies, the nature of the patient cohort should be considered. Patient cohorts from primary care may under-record Screen-Detected cases, incidental diagnosis made while under hospital care and Emergency Presentations that result in death shortly after diagnosis. Patient cohorts from secondary care may under-report clinical diagnoses if case finding relied on histopathological databases. In addition, statistical fluctuations in the observed proportions may occur in studies conducted at single centres because of the comparatively low number of cases diagnosed for each tumour type over the study period.

Comparable results from other studies are presented in Table 4. There is generally a good agreement between the proportion of cases diagnosed via TWW in our study and the majority of those studies in which case finding was done in secondary care (Neal et al, 2007; Thorne et al, 2009; Blick et al, 2010). The proportion of Emergency Presentations seen in this study is also comparable to those seen in studies with case finding via secondary care or cancer registries (Barrett and Hamilton, 2005; Blick et al, 2010).

Table 4 Proportion of tumours in selected comparable studies for GP, TWW and Emergency routes

The total proportion of cases which present via GP Referral is higher in the studies examined compared with this study. This might be explained by the inclusion in this study of separate categories for Screen-Detected, Unknown, Other Outpatient and Inpatient Elective routes. In particular, it is possible that the majority of cases in the Other Outpatient and Inpatient Elective routes were originally initiated by a GP Referral. Further work linking Routes to Diagnosis results to GP data sets is needed to explore this supposition.

Although some of the eight routes are specific to the English health care system; the methodology can be used in other countries where data sets exist, detailing episodes of hospital care. The routes of TWW, GP Referral, Inpatient Elective and Other Outpatient could all be considered to have originated from a GP Referral. Thus, a more general international comparison would be possible using only five distinct routes, with these four forming an overall ‘GP-Initiated’ route.

In summary, we have demonstrated a methodology for categorising a Route to Diagnosis for all registered tumours, using large routinely available health service data sets. This can be applied in an automated fashion to all patients diagnosed with cancer in England that are recorded by the cancer registries and enables research to be undertaken to understand differences within these groups. The frequencies with which these routes are followed in the diagnosis of cancer are in reasonable agreement with previous clinical studies, and show plausible variation by cancer type and age. The route has a significant association with 1-year relative survival. In particular, the substantially lower relative survival in Emergency compared with non-emergency routes indicates that this distinction is of high clinical significance.