Which patients are not included in the English Cancer Waiting Times monitoring dataset, 2009-2013? Implications for use of the data in research.

BACKGROUND
Cancer waiting time targets are routinely monitored in England, but the Cancer Waiting Times monitoring dataset (CWT) does not include all eligible patients, introducing scope for bias.


METHODS
Data from adults diagnosed in England (2009-2013) with colorectal, lung, or ovarian cancer were linked from CWT to cancer registry, mortality, and Hospital Episode Statistics data. We present demographic characteristics and net survival for patients who were and were not included in CWT.


RESULTS
A CWT record was found for 82% of colorectal, 76% of lung, and 77% of ovarian cancer patients. Patients not recorded in CWT were more likely to be in the youngest or oldest age groups, have more comorbidities, have been diagnosed through emergency presentation, have late or missing stage, and have much poorer survival.


CONCLUSIONS
Researchers and policy-makers should be aware of the limitations in the completeness and representativeness of CWT, and draw conclusions with appropriate caution.

Successive national cancer plans and strategies for England have included targets to reduce waiting times to diagnosis and treatment for all cancers (Department of Health, 2000, 2007, 2011Independent Cancer Taskforce, 2015). These include a maximum 2-week wait (TWW) between an urgent referral from a general practitioner (GP) to being seen by a specialist, a maximum 62 days from the GP's urgent referral to the start of first treatment, and a maximum 31 days from the decision to treat a patient to the start of treatment. A new 28-day target to confirm or exclude a cancer diagnosis has also been proposed (Independent Cancer Taskforce, 2015).
Waiting time targets are considered to be important indicators of the quality of cancer care. National cancer waiting time statistics have been published quarterly by NHS England since 2013 À 2014 (Cancer Waiting Times Team, 2016) and previously by the Department of Health. Performance varies widely across the country: many Clinical Commissioning Groups fall below current operational standards, and adherence to the 62-day target has been decreasing since 2014 (Cancer Waiting Times Team, 2016).
The English National Cancer Waiting Times monitoring dataset (CWT), is the basis for these official statistics. The data are collected by 'the provider that is commissioned to deliver the activity' (Cancer Waiting Times Team, 2015). CWT only contains diagnosis and treatment information on cancer patients who were offered treatment within the NHS, including those who refused treatment or were assigned to active monitoring (Cancer Waiting Times Team, 2015), whichever diagnosis route they came through. However, not all eligible cancer patients are included and the extent of incompleteness is unclear. We examine recent CWT data for three cancers, linked to individual cancer patient data, to compare the characteristics of patients who were and were not included in CWT.

MATERIALS AND METHODS
Data sources and study population. All adults (15 À 99 years) who were diagnosed in England during 2009 À 2013 with colorectal, non-small cell lung or ovarian cancer were included.
CWT diagnosis and treatment data were linked at individual level with the national cancer registry data (including vital status at 31 December 2014), the Hospital Episode Statistics data, and the Routes to Diagnosis dataset (Elliss-Brookes et al, 2012). Demographic information included age, sex, and deprivation quintile. Stage at diagnosis and Charlson Comorbidity Index score (Charlson et al, 1987) were derived using algorithms applied to these datasets and audit data, where available (Benitez-Majano et al, 2016;Maringe et al, 2017).
Statistical analysis. We examined the sociodemographic and clinical characteristics of cancer patients and estimated net survival at 1 year. We report results by patient age group, deprivation quintile, and tumour stage. Net survival can be interpreted as survival from the cancer, accounting for the mortality from other causes, using life tables of the England general population stratified by age, sex, calendar year, and region (Spika, 2015).
The completeness of CWT improved slightly during 2009 À 2011, more for lung and ovarian cancers than for colorectal cancer, then plateaued until 2013. The percentage of patients included varied by patient and tumour factors, with nearly all differences statistically significant in a w 2 -test at Po0.001 (Table 1, Figure 1).
There was a strong J-shaped age pattern in the probability of inclusion in CWT (Figure 1): the youngest and, especially, the oldest patients were least likely to be included. Among those older than 70 years, over a quarter with colorectal cancer and around 40% with lung or ovary cancer had no record in CWT. More affluent patients were less likely to have a CWT record. Women with colorectal cancer were slightly less likely to have a CWT record than men (80.2 vs 82.7%), but there was no evidence of a difference between the sexes for lung cancer (P ¼ 0.211).
A CWT record was missing for more than half of patients whose route to diagnosis was unknown, and for around a third of those diagnosed through an emergency presentation.
More than 85% of patients with stage I and II tumours were recorded in CWT ( Figure 1). Among colorectal and lung cancer patients, those with missing stage were also the most likely to be missing from CWT, although the proportion was similar to patients diagnosed at stage IV. Among women with ovarian cancer, a similar proportion of cases with stage IV and missing stage tumours were not recorded (24%). Patients with more comorbidity were less likely to be recorded in CWT (Figure 1). For women with ovarian cancer, those with no comorbidities were 30% more likely to have a CWT record than those with the most comorbidity.
Patients who died within 30 days of diagnosis were much less likely to have a CWT record, ranging from 60% (colorectal and lung cancers) to 73% (ovarian cancer), but 15 À 18% of those who survived at least 30 days were also not captured (Table 1), with a similar J-shaped age pattern to that of the whole cohort.

DISCUSSION
Using a new approach to examine the English CWT, we linked individual data from several sources to describe the characteristics and short-term survival of patients who were not included in the dataset. Around one-fifth of patients diagnosed with colorectal, lung, or ovarian cancer during 2009-2013 did not have a CWT record. Proportions were highest among elderly patients and those with comorbid conditions, mirroring patterns for those with missing stage information (Adams et al, 2004;Worthington et al, 2008). Patients missing from CWT were also more likely to have advanced disease or missing stage information: these factors are highly correlated (Barclay et al, 2016). However, more than a quarter of the youngest patients and more than 10% of those with early-stage disease also lacked a CWT record, suggesting that the recording of CWT data could be improved.
Several mechanisms may help explain the pattern and extent of missing records. Some treatments may not be well recorded (e.g., pain relief or transfusions). The CWT does not include data on patients who died before treatment could commence, even if a decision to treat had been made. However, in our data, only 22%, 44%, and 31% of colorectal, lung, and ovarian cancer patients without a CWT record, respectively, died within 30 days of diagnosis. Services not commissioned by the NHS are also beyond the scope of CWT data collection, so patients treated in the private sector, including palliative care in non-NHS organisations, are not captured (Cancer Waiting Times Team, 2015). The extent of this is unclear, but it is reported that around 11% of the UK population has some private health insurance (The King's Fund, 2014). The proportion of colorectal and ovarian cancer patients included in CWT was indeed lower among those in the most affluent quintile, who are more likely to seek private care.
A CWT record may be missing because of a clinical decision not to treat a patient: this may explain why older and sicker patients are less likely to be included. Indeed, under-or sub-optimal treatment in the elderly has been reported in England (National Cancer Equality Initiative, 2012;Forrest et al, 2014;Lawler et al, 2014).
One-year survival was generally much lower among patients with no CWT record, but no survival differences were found among the youngest patients between those with and without a record. This suggests that younger patients may well have had treatment that was not captured in CWT.
There may be some administrative under-reporting by which patients received treatment that was not recorded in CWT. The presence in CWT records of patients who received 'subsequent treatment' without any record of first treatment (5-9% depending  on the cancer) lends weight to this. The proportion of patients with no CWT record was also higher among those with missing information on stage and with an unknown route to diagnosis, suggesting that there is a group of patients whose information is generally poorly captured, due to shortcomings in data transmission system or clinical documentation. Our novel approach offers a clear picture of the characteristics of patients who are not included in CWT, but it cannot fully illuminate the mechanisms of this incompleteness. The extent to which the missing data create scope for selection bias must be considered, especially if the reason that patients are not included is related to whether they meet the waiting times target. The generalisability of results from CWT to the whole population of cancer patients recorded in the cancer registration data may be limited, because they do not reflect the outcomes for patients who were treated outside the NHS, or of patients who were not captured despite having received some treatment. These sources of bias should be considered when interpreting the results or using the CWT data to evaluate cancer outcomes at patient level.
The CWT dataset is an important source of data: it allows monitoring of key indicators of NHS performance in cancer care and the targets encourage timely treatment. However, researchers and policy-makers should be aware of these limitations in the completeness and representativeness of the CWT data, and should draw any conclusions with appropriate caution.

ACKNOWLEDGEMENTS
The Cancer Waiting Times monitoring dataset was provided by NHS England through Public Health England. We are grateful to the members of the CRUK EDAG Scientific Advisory Group for their help in developing this project. We thank members of the Cancer Survival Group at LSHTM for their advice and support. CDG, SW, SBM, and MM are funded by the Cancer Research UK Early Diagnosis Commissioned Policy Research Programme at the London School of Hygiene and Tropical Medicine (award number C7923/A18348). The funding body collaborated in the design of the study but had no role in the collection and analysis of data, interpretation of results, or in writing the manuscript. The Cancer Survival Group obtained ethical and statutory approvals to use these routinely collected data from the National Research Ethics Service Committee London-Camden & Islington on 28 May 2013 (Research Ethics Committee reference 13/LO/0610, confirmed on 29 January 2015). All the data are anonymised and the researchers had no access to personally identifiable data.