Estimating relative survival among people registered with cancer in England and Wales

Because routinely collected survival data for cancer patients in England and Wales do not typically specify cause of death, conventional estimates of survival in cancer patients based on such data are a measure of their mortality from all causes rather than their mortality due to cancer. As a result, trends in survival over time are difficult to interpret because changes in overall survival may well reflect changes in the risk of death from other causes, rather than from the cancer of interest. One way of overcoming this problem is to use some form of ‘relative survival’ defined as a measure of survival corrected for the effect of other independent causes of death. Since this concept was first introduced, various methods for calculating relative survival have been proposed and this had led to some confusion as to the most appropriate choice of estimate. This paper aims to provide an introduction to the concept of relative survival and reviews some of the suggested methods of estimation. In addition, a particularly simple, but robust approach, is highlighted based on expected and observed mortality. This method is illustrated using preliminary data from the Office for National Statistics on cancer survival in patients born after 1939 and diagnosed with cancer during 1972–84. The examples presented, although limited to analyses on a small number of selected sites, highlight some encouraging trends in survival in people aged under 35 diagnosed with leukaemia, Hodgkin's disease and testicular cancer during this period. © 1999 Cancer Research Campaign

One approach to assessing progress against cancer is to examine changes in the survival of cancer patients over time. Survival data for patients registered with cancer are compiled centrally by the Office for National Statistics (ONS) from information collected by regionally based cancer registries, and these data provide the main source of routine survival statistics for England and Wales. Although the ONS data include information on time from diagnosis to death, the specific cause of death is not routinely collected and so conventional estimates of survival based on these data reflect death from all causes rather than just the cancer of interest. Corresponding trends in survival estimates are sometimes difficult to interpret because it may be unclear whether changes in survival over time among a given group of cancer patients are in fact due to changes in the risk of death from the cancer itself, or to changes in the risk of death from causes other than that cancer.
In situations like this, it is common to use some form of 'relative survival rate' (Ederer et al, 1961) which is usually defined as a measure of survival 'corrected' for the effect of other independent causes of death, commonly referred to as background mortality.
Although this approach appears simple in principle, there has been considerable debate in the statistical literature as to the correct choice of estimate, and it has been shown that use of certain estimates can lead to spurious results for long-term relative survival (Hakulinen, 1977). This paper aims to provide a straightforward introduction to the concept of relative survival and reviews some of the suggested methods for estimation. In particular, a simple person-years approach is described that is based on directly interpretable measures such as observed and expected numbers of deaths. This approach is illustrated using preliminary data from ONS on cancer survival in patients born after 1939 who were diagnosed with cancer during the period 1972-84.

Materials and methods
The concept of relative survival was devised to provide an objective measure of the proportion of patients dying from the direct or indirect consequence of a disease in a given population and, hence, a measure of survival corrected for the effect of other independent causes of death. The basic definition of the relative survival rate is given below.
Let T d denote the time to death assuming that the subject is only at risk of death from the disease of interest, T e the time to death assuming that the subject is only at risk of death from all other causes, and T=minimum(T d ,T e ) the observed time to death, all measured from some suitable reference point such as date of diagnosis. Assuming that the cause of interest and all other causes are independent, the overall probability of surviving to time t is where S e (t) denotes the probability of survival to time t if at risk from all other diseases, and S(t) is the observed probability of survival to time t. The ratio S r (t) is termed the relative survival rate and can be viewed as the probability of survival to time t with the disease of interest in the absence of the risk of death from other causes. Unlike crude survival, which would be expected to decline with increasing time, a levelling off in the relative survival rate after a given time, t, can occur if the individual is no longer at risk of death from the disease of interest. Thus, an apparent flattening of the cumulative relative survival curve is often taken to indicate that individuals who survive a given length of time are effectively 'cured' of the disease. In general, S(t) is estimated by the observed life-table estimate, based on the study cohort. The term S e (t) is substituted by some estimate of the expected probability of survival to time t, from all causes other than the disease of interest, for a group similar to that under study. Most of the discussion of the relative survival rate has centred on the choice of estimate for S e (t). The simplest approach, adopted by Ederer et al (1961), is to calculate the expected t-year survival probability for each individual alive at the beginning of the follow-up, based on relevant available life-tables, and take S e (t) to be the average of these values; in other words: where N is the total number of individuals in the study cohort. This estimate of S e (t) depends only on the composition of the cohort at the beginning of the study and takes no account of the subsequent withdrawal pattern. The main problem with the estimate in equation 1, which was originally identified by Hakulinen (1977), becomes apparent when long-term relative survival rates are considered. Suppose that we are interested in the overall relative survival rate for a heterogeneous cohort; using equations 1 and 2, this can be expressed as the following weighted average where j is taken to denote age group or any other prognostic factor, n j is the number of individuals in the jth group at the start of the study and As t becomes large, w j tends to unity for the group with the best 'expected survival', S e j (t), and to zero for all other groups. Thus, as the time period of interest increases, the overall relative survival estimate will tend to that of the group with the best expected survival, usually the youngest group. If one plots the relative survival rate (equation 3) for a heterogeneous population as a function of time, it sometimes appears as if relative survival starts to increase after a long period of time. Many people have attributed this to the possibility that improved medical care of cancer patients prevents them dying of causes other than their primary disease, but it may well be a spurious effect of using the form equation 2 as an estimate of S e (t) (Hakulinen, 1977).
If the expected survival of the study group was in itself of interest, equation 2 would be the correct estimate because it is based entirely on the composition of the study group at time 0 and is unaffected by the subsequent withdrawal pattern. However, the other component in equation 1, the observed survival S(t), is heavily dependent on the pattern of withdrawal. The estimate of S e (t) should also, therefore, take into account the changing composition of the cohort by, for example, replacing S e (t) with a life-table estimate of expected survival as proposed by Ederer and Heise (1959). A simpler alternative approach, however, is to use the notion of excess death rates based on observed and expected mortality as suggested, for example, by Pocock et al (1982). In their paper, Pocock et al use excess deaths rates to make valid comparisons of survival between groups with potentially different background mortality rates. In what follows, we assume that interest lies specifically in estimating relative survival and, therefore, show how to translate excess or net mortality into corresponding relative survival estimates and derive corresponding approximate confidence intervals.
For each year (or other suitable time interval) after diagnosis, let O denote the observed number of deaths in the study cohort, E the expected number of deaths from causes other than the disease of interest (based on appropriate regional rates and taking into account calendar period, age and/or other relevant factors), and Y the number of person-years contributed during that interval. The 'net mortality rate', that is the difference between the observed This net death rate is taken to represent the death rate due to the cause of interest in the ith year. If patients who survive to time t are no longer at any excess risk of death relative to the general population then the net mortality rate at time t should be 0 and this can be used as a statistical test of 'curability'. The cumulative t-year net mortality rate can be expressed as and, hence, the t-year relative survival rate is given by Approximate confidence intervals for the relative survival estimate, S r (t), can be derived by assuming a normal distribution for log {-log S r (t)}, with estimated variance Thus, an approximate 95% confidence interval for S r (t) is given by Person-years for each time interval and the corresponding expected number of deaths can be obtained using any of the standard algorithms, for example PERSON-YEARS (Coleman et al, 1986), and the remaining calculations are easily programmed. Several other approaches to the estimation of relative survival have been proposed (Hakulinen, 1982(Hakulinen, , 1985, but these are generally more complex than those described here. The person-years approach described above can be thought of as a special case of a more generalized modelling approach proposed by Esteve et al (1990) in which the net mortality rate is allowed to depend on any number of explanatory variables. Such a modelling approach is clearly more appropriate when primary interest lies in assessing which of many potential factors are important in determining mortality from the cause of interest. In some situations, it may be of interest to compare survival in a group of patients with their expected survival given their observed values of certain relevant covariates. In this case, Thomsen et al (1991) consider a continuous time analogue of relative survival in which the component representing expected survival takes into account historical information about the relationship between survival and the covariates of interest.

Application of the method to the analysis of survival of young patients diagnosed with cancer in England and Wales during 1972-84
The data used here are a subset of the national data on patients with cancer, compiled by ONS, and are unique in that the vital status of every patient registered after 1971 has been specially checked against the National Health Service Central Register. This subset includes data for only a few cancer sites and is limited to people born after 1939. The examples in this paper are, therefore, limited to results of analyses on a small number of selected cancers in relatively young patients, aged under 35 at diagnosis, and diagnosed between 1972 and 1984. Recent redevelopment of the computer system at ONS, however, means that vital status can now effectively be confirmed for all individuals registered with cancer in England and Wales from 1971-90, and this initial report will, therefore, be followed by an analysis of the entire ONS database for this period. As outlined in Materials and methods, time since diagnosis was divided up into yearly intervals; for each of these intervals, person-years and expected numbers of deaths were calculated using published death rates for England and Wales from all causes except that under study. All calculations were carried out using the PERSON-YEARS program (Coleman et al, 1986). For the purposes of these analyses, follow-up was censored at 1 January 1990 and cancers which were registered at the time of death were excluded. Results are shown separately for five selected sites. For each cancer, a plot of the cumulative relative survival rate (equation 4) was plotted against time since diagnosis. Because we are particularly interested in trends in survival over calendar period, relative survival curves are calculated separately for individuals diagnosed in periods 1972-75, 1976-78, 1979-81 and 1982-84. Figures 1-7 show plots of the cumulative relative survival up to 10 years after diagnosis for selected primary sites and age groups according to calendar period of diagnosis. Table 1 summarizes these results and gives the total number of registrations available for analysis in each case.
The results show that for some, but not all, of these cancers, survival for individuals aged 15-34 at diagnosis has improved substantially between 1972 and 1984. The most dramatic increases in survival shown here are for leukaemia. Because leukaemia is the most common childhood cancer, survival with acute lymphatic leukaemia (ALL) and with acute myeloid leukaemia (AML) is given both for children diagnosed under the age of 15 (Figures 1  and 2) and for adults aged 15-34 at diagnosis (Figures 3 and 4). The results for childhood leukaemia show that 5-year relative survival rates with AML increased from 9% in 1972-75 to 34% in 1982-84, whereas the rates for ALL increased from 45% to 74% over the same period. The results for adult leukaemias also show marked improvements in survival. In particular, 5-year relative survival rates for AML diagnosed at ages 15-34 rose from 6% in 1972-75 to 31% in 1982-84, whereas figures for ALL showed an increase from 22% to 44% over the same period. The results for testicular cancer (Figure 5) also show a gradual and significant increase in survival over this period, with 5-year relative survival rising from 65% in 1972-75 to 89% in 1982-84. Patients with Hodgkin's disease, aged 15-34 at diagnosis, showed some improvement in survival between 1972-84 ( Figure 6) with 5-year relative survival increasing from 76% in 1972-75 to 87% in 1982-84. By comparison, survival with breast cancer among cancer patients aged 15-34 (Figure 7) showed relatively little change over this period.

DISCUSSION
The main purpose of this paper is to illustrate a simple method of calculating relative survival based on routine data. Nevertheless, it highlights some interesting trends in the survival of young people diagnosed with certain cancers. Moreover, the patterns of survival observed are reassuringly in line with what is known about recent advances in the treatment of those cancers. For example, examination of the relative survival rates for testicular cancer by individual year between 1972 and 1984 suggest that the biggest increases occurred during the periods 1976-77 and 1979-80 (data not shown). This is likely to be due to the introduction of cisplatinbased chemotherapy, which was first used around 1976 before the commencement of widespread use in 1979 (Edmiston and Stewart, 1993). With regard to acute lymphatic leukaemia and acute myeloid leukaemia, the results shown here for children diagnosed under the age of 15 show similar patterns to those found by Stiller and Bunch (1990), who examined trends in survival with childhood cancers in England and Wales over the period 1971-85. The results also suggest that, to a certain extent, these improvements extend to young adults diagnosed at ages 15-34 and this finding is broadly in line with data on adolescent patients from Sweden (Adami et al, 1992).
One question which is often of interest in analyses of survival patterns in cancer patients is whether or not patients can ever be assumed to be effectively 'cured' of a particular cancer. One definition of curability, though not necessarily equivalent to a clinical definition, is when a patient assumes the same mortality rate as that of the general population. In such cases, the cumulative relative survival curve would be expected to level off after a given length of time. For some of the cancers shown here, such as leukaemia and testicular cancer, it is clear that this ultimately appears to be the case. For example, Figure 5 suggests that not only do as many as 89% of all testicular cancer patients diagnosed in 1982-84 survive beyond 5 years, but that those that do are no longer at any increased risk of death compared with the general population. For cancers such as breast cancer, in which there is no obvious levelling off in the cumulative relative survival rate before 10 years after diagnosis (Figure 7), the picture is less clear, and this has long been an area for discussion in the literature (Brinkley and Haybrittle, 1975;Duncan and Kerr, 1976;Langlands, 1995). In such cases, it is important to extend the estimates of cumulative relative survival to assess whether a certain proportion of patients ever reach a point at which they are no longer at an increased risk of death relative to the general population.
In this particular example, in which the data consist of relatively young patients, it is unlikely that the relative survival rates would have differed substantially from crude survival rates. There would, however, almost certainly be important differences if assessing trends in survival among older patients among whom a substantial proportion of the observed deaths would be likely to be due to other causes. Furthermore, although we have introduced the notion of relative survival in the context of analyses in which the cause of death is unavailable, there are arguments for applying this method to data in which cause of death can be ascertained from death certification data. Several authors have commented on the fact that even when cause of death has been routinely recorded, it is often difficult to determine whether a patient's death was due directly or indirectly to their cancer and in those cases relative survival may provide a more objective means of removing the effect of mortality from other causes (Berkson and Gage, 1950).
In summary, this report discusses some of the issues surrounding the analysis of trends in survival based on routinely collected cancer registration data, and illustrates a relatively simple but robust method of calculating an estimate of relative survival which is suitable for use with large data sets. The examples presented here also highlight some very promising trends in the survival of young adults with certain cancers, particularly those with leukaemia and testicular cancer.