Main

Colorectal cancer (CRC) is the third most common cancer in the United Kingdom, affecting 40 000 individuals and accounting for 16 000 cancer-related deaths each year (Cancer Research UK, 2012). Family history is recognised to be a risk factor for CRC, with relatives of CRC cases having a two- to three-fold increased risk (Johns and Houlston, 2001). Although part of the familial risk can be ascribed to a number of inherited cancer syndromes, most of the heritable risk remains unexplained (Aaltonen et al, 2007).

Significant research effort has been focussed on extending our understanding of inherited susceptibility to CRC and the biological basis of genetic risk factors. Much of this research has been contingent on the development of large case series for gene discovery efforts. For example, within the United Kingdom, the National Study of Colorectal Cancer Genetics (NSCCG) (Penegar et al, 2007; Houlston et al, 2012) has collected DNA and clinicopathological data from >25 000 patients with histologically proven CRC.

As a potential prognostic factor, the concept of germline variation imparting interindividual variability in tumour development, progression and metastasis is receiving increasing attention (Kune et al, 1992; Registry Committee and Japanese Research Society for Cancer of the Colon and Rectum, 1993; Bass et al, 2008; Chan et al, 2008; Zell et al, 2008; Birgisson et al, 2009; Kao et al, 2009; Kirchoff et al, 2012). Some studies have demonstrated survival advantage for patients with familial CRC (Registry Committee and Japanese Research Society for Cancer of the Colon and Rectum, 1993; Chan JA et al, 2008; Zell et al, 2008; Birgisson et al, 2009; Kirchoff et al, 2012) but this finding has not been universal (Kune et al, 1992; Bass et al, 2008; Kirchoff et al, 2012).

The ability to relate detailed genetic information to management and outcome in large case series is highly desirable but difficult to achieve. Within the United Kingdom, a potential solution is the National Cancer Data Repository (NCDR) (National Cancer Intelligence Network, 2012) that contains population-based routine administrative National Health Service (NHS) data sets linked together to enable the pathways of all diagnosed with cancer in England to be tracked from diagnosis to cure or death. Inclusion of genetic information captured by studies such as the NSCCG into this resource offers the prospect of being able to relate genotype to phenotype, management and outcome data on a large scale. We sought to assess the feasibility of such a strategy and have investigated the relationship between a family history of CRC and patient outcome.

MATERIALS AND METHODS

Patients and record linkage

Information on CRC patients recruited before September 2011 was obtained from the NSCCG database. As the study period and recruitment area of the NSCCG are not fully compatible with the data held in the NCDR, a number of exclusions were made (Figure 1). First, the NSCCG recruits CRC patients from across the United Kingdom, whereas the NCDR is currently limited to England. Individuals residing outside England were, therefore, excluded. Furthermore, at the time of analysis, the NCDR was only complete for cancers diagnosed between 1990 and 2008, and hence cases recruited into the NSCCG after 2008 were also excluded. The remaining cases were linked to the NCDR using all or combinations of the identifiers of name, NHS number, date of birth, sex, hospital of management/histology, hospital number and postcode at diagnosis.

Figure 1
figure 1

The results of the NSCCG and NCDR matching process.

The NCDR holds information about all tumours diagnosed in England, allowing matching of NSCCG cases diagnosed with multiple cancers to be matched to multiple records. For NSCCG patients with multiple CRCs, the first diagnosed was considered as the index tumour and information about this cancer was used in analyses. If an NSCCG patient was linked to the NCDR but not to a CRC record, then that patient was only deemed to match if there was evidence that the tumour recorded by the registry was, indeed, relevant to why the individual had been recruited to the NSCCG (e.g., the registry had recorded an anal tumour rather than a colorectal tumour). NSCCG participants who were linked to any other tumour sites were excluded.

Age at diagnosis was derived from NCDR based on the date of diagnosis of the index tumour. Colonic tumours in the appendix, caecum, ascending colon, hepatic flexure and transverse colon (ICD10 C180-C184) were considered to be right-sided tumours, whereas those at the splenic flexure and in the descending colon, sigmoid colon and rectosigmoid junction were considered to be left-sided tumours (ICD10 C185-C187 and C19). Tumours overlapping two sites in the colon (C188), with no site specified (C189), and all the noncolorectal cancer matches (excluding anal cancers) were included in a category called colon not otherwise specified (NOS). Rectal and anal tumours (ICD10 C20-C21) were assigned to a rectal cancer category.

Statistical analysis

Statistical analyses were conducted using Stata version 11.0 (State College, TX, USA). A P-value of 0.05 (two sided) was considered to be significant. Differences in patient characteristics between groups were assessed using χ2 and Kruskal–Wallis tests. Survival was calculated from the date of recruitment to the NSCCG to date of death or when censored (30 June 2010). Kaplan–Meier graphs, log-rank tests and Cox proportional hazards models were used to investigate the relationship between family history and survival.

RESULTS AND DISCUSSION

Of the 21 223 CRC patients recruited to the NSCCG, 10 937 (51.7%) were eligible for matching and, overall, 10 782 (98.6%) were matched to tumours considered eligible (Figure 1) and they form the basis of the cohort used for comparative analyses.

Of this population, 1697 (15.7%) reported on their NSCCG recruitment questionnaire a family history of the disease (defined as a first-degree relative (parent/sibling/offspring) with a diagnosis of CRC). There were no significant differences between the two groups in terms of age, sex, Dukes’ stage, presence of multiple cancers, comorbidity, mode of presentation to hospital and surgical management (Table 1). A higher proportion of patients with familial CRC, however, had right-sided disease (P<0.01; Table 1).

Table 1 Characteristics of the study cohort

Figure 2 shows that the overall 5-year survival for familial CRC patients was significantly better than those with sporadic disease, and the survival advantage was correlated to the number of affected family members, notably in the small number of individuals (n=211) with two or more family members also diagnosed with CRC. This effect remained in a case-mix adjusted Cox proportional hazards model (Table 2a), with this group having a 25% reduction in their risk of death compared with those with sporadic disease (HR=0.75, 95% CI: 0.57–0.98, P=0.04). A stronger effect was observed when the effect of any family member having a history of colorectal cancer was examined (Table 2b). In this analysis, those with a family history had an 11% reduction in the risk of death compared with those with no family history (HR=0.89, 95% CI: 0.81–0.98, P=0.02).

Figure 2
figure 2

The 5-year survival in relation to the number of first-degree relatives with colorectal cancer.

Table 2 Cox proportional hazards model of the risk of death in relation to the (a) number of first-degree family members affected by colorectal cancer and (b) any family history of colorectal cancer

The basis of a survival advantage associated with familial CRC is unclear. It is possible that a family history of the disease may heighten awareness of CRC in family members, hence leading to earlier detection and, thus, better prognosis. In our study, however, stage at diagnosis and the proportion of cases presenting as an emergency was similar across family history groups and the survival difference persisted after adjusting for case mix. These observations suggest that the difference in survival afforded in relationship to familial CRC was not simply a consequence of lead-time bias.

Our study also showed that a high proportion of individuals with a family history of CRC had right-sided tumours. This association is well recognised with right-sided tumours tending to arise because of deficient mismatch repair mechanisms that are linked to improved prognosis (Gryfe et al, 2000; Samowitz et al, 2001; Ricciardiello et al, 2003). As there is evidence that constitutional genotype influences response to chemotherapy (notably with respect to MMR status) and as family history is reflective of inherited genetic susceptibility, it is entirely plausible that the association between family history and better prognosis is reflective of an overrepresentation of MMR and polymerase gene defects affecting responsiveness. Our initial linkage has permitted this possibility to be addressed and further work will be undertaken to investigate this issue.

A limitation of the present study is that it has relied on self-reported family history and the accuracy and completeness of this information could vary for many reasons. As the NCDR contains information on all cancers diagnosed in England, future linkages should make it possible to eliminate any inaccuracy by verifying the accuracy of the histories provided.

The routine data that the NCDR is composed of may also limit the study. For example, it was not possible to match all the NSCCG patients into the NCDR as the resource is currently confined to patients diagnosed with cancer in England. Also, although a small minority of the cases who should have matched into the NCDR could not be linked, others did not link to CRC registrations. These failures were unusual but, nonetheless, an issue. They may be because of missed registrations, incorrect coding of cancer or inaccurate or incomplete sets of identifiers preventing linkage. Similarly, a number of individuals could not be linked because of the temporality of the data available in the NCDR. Both the scope of the NCDR and the time lag in the collection of the data it is composed of are being actively addressed and this should enable a much larger cohort of individuals from NSCCG to be linked.

Accepting these caveats we have shown that it is possible to robustly match patients recruited to the NSCCG into the NCDR and, using these data, demonstrate a statistically significant relationship between family history of CRC and better clinical outcome. Moreover, the linkage illustrates the potential of using routine data to relate genotype to management and outcome data and enhance our understanding of the processes underlying both the development and progression of CRC. The growing amount of data related to prognosis (including detailed pathology, chemotherapy and radiotherapy data) being captured by the NCDR will also enable these analyses to be appropriately adjusted to robustly delineate the true effect of genetic variations on prognosis. Many chemotherapy drugs and treatments are being developed that target subgroups of patients with specific genetic mutations (National Institute for Health and Clinical Excellence, 2009). Significant resource is being invested in developing such treatments, but very little is known about their use and effectiveness at a population level. Linking genetic data to the management and outcome data in the NCDR offers enormous scope to increase this evidence base.