Calculating age-adjusted cancer survival estimates when age-specific data are sparse: an empirical evaluation of various methods

We evaluated empirically the performance of various methods of calculating age-adjusted survival estimates when age-specific data are sparse. We have illustrated that a recently proposed alternative method of age adjustment involving the use of balanced age groups or age truncation may be useful for enhancing calculability and reliability of adjusted survival estimates.

Age adjustment is widely used in studies comparing survival of different cancer patient populations. Most commonly, the comparison is made by calculating a weighted average of agespecific survival estimates, using weights reflecting the age distribution of some defined standard population. There has, however, been a large diversity in the practice of age adjustment, mainly concerning the definition, the number and width of the age groups, and methods to overcome difficulties when the agespecific survival could not be calculated. Such situations are not uncommon in comparative survival studies, particularly when data for rare cancers or data from registries covering smaller populations is involved. Examples of the practice of age adjustment of cancer survival are given in Table 1.
Recently, an alternative method was proposed for age adjustment of survival estimates , in which weights are assigned to the patients before one carries out the survival analysis. Weights are determined as the percentage of patients in the age group the patient belongs to in the standard population divided by the corresponding percentage in the study population. Survival analysis is carried out using the weighted individual data, without the need to calculate age-specific survival estimates .
In this paper, we empirically evaluate and compare the performance of different options of age adjustment methods in situations when age-specific data are sparse.

Data set
Data from the Zimbabwe National Cancer Registry (ZNCR) (Chokunonga et al, 2000;Parkin et al, 2003) were used. The registry, established in 1985, covers the population of the Zimbabwean capital, Harare. In terms of operational circumstances, which have been described in detail elsewhere, the registry may be considered as a typical example for an urban developing country cancer registry with appropriate data quality outcomes (Sankaranarayanan et al, 1998;Parkin et al, 2003;Chokunonga et al, 2004). Survival results for a large number of cancer sites were recently published elsewhere (Gondos et al, 2004).
We assessed various options of age adjustment of 5-year survival estimates among patients diagnosed with five different types of cancers in 1993 -1997 and followed up until 31 December 1999: skin melanomas, breast, cervical and prostate cancer and lymphomas. Breast and cervical cancers were included because they were represented by relatively large samples. Prostate cancer, for which only 3-year survival could be calculated due to the lack of patients with a 5-year follow-up time, was included because of the unusual age distribution (high proportions of older patients). Lymphomas were included because of the uniquely wide age range of the patients, and skin melanomas were selected as an example of an analysis with a very small sample.

Calculation of age-adjusted survival
The site-specific World Standard Cancer Patient Populations (WSCPP) were used as standard populations (Black and Bashir, 1998). Adjustment of the survival estimates was carried out according to the traditional method, and the alternative method recently proposed by , using the age categorisation schemes described below. Throughout this paper, relative rather than absolute survival estimates are presented. The relative survival estimates were calculated according to Hakulinen's method (Hakulinen, 1982), using the WHO life tables for Zimbabwe (WHO, 2001). The calculations were carried out using the SAS macros periodh (Brenner et al, 2002) and adperiodh .
Scheme 1: Age adjustment with fixed age group width First, for each cancer site, we classified the patients by 5-, 10-, 15and 30-year age groups. With each of these classifications, both the youngest and the oldest age groups were selected so that they actually included patients, and the age of the youngest/oldest patient determined the first/last age group (eg, if the youngest   patient was 28, the first 5/10/15/30-year age group was 25 -29/20 -29/15 -29/0 -29, respectively).

Scheme 2: Collapsing the youngest and oldest age groups
If needed, that is, if the age adjustment failed, we applied modifications to the boundaries of the youngest and the oldest age groups to enhance calculability: the boundaries of these age groups were modified so that the age-specific survival in these (youngest and oldest) age groups could be calculated. Age groups in between were left unchanged, except for the 30-year age groups, where the shifting of the first or last age group affected the middle age group as well.

Scheme 3: Balanced age groups
Here, we reorganised the age groups in such a way that the number of observations in the age groups would be approximately evenly distributed. The number of age groups was varied between 3 and 5. The boundaries of these 'balanced' age groups were aligned to the nearest of those of the original 5-year age groups.

Calculation of truncated survival
Age-specific survival estimates are often unreliable for older age groups in data from developing countries. Often, as is the case in our study, the standard population gives more weight to the oldest age group than does the study population. In these cases, the adjusted survival estimate can easily become unreliable, as the adjustment assigns a large weight to an unreliable age-specific survival estimate. We therefore repeated all calculations with a truncated age range (0 -74 years), following the practice in the so far largest comparative survival study from developing countries (Sankaranarayanan et al, 1998). Table 2 shows the numbers of patients by age group in the Zimbabwean cancer populations, illustrates the differences between the age distributions of the study and the standard populations, and indicates the age groups for which the 5-year age-specific survival estimate could not be calculated. The WSCPP include a much higher proportion of patients in the oldest 2 -4 age groups than the Zimbabwean patient populations. Table 3a provides survival estimates adjusted by the traditional and the alternative method, with all ages included, according to the different schemes we applied. With the traditional method, the fixed age group classifications often failed, due to a failure in calculating 5-year age-specific survival estimates. With collapsed or balanced age groups, the traditional age adjustment became feasible in most cases. The alternative method was feasible even with most fixed age group categorisations, except for the 5-year categories for skin melanomas and lymphomas, where one age group was empty and therefore the weight to be assigned to the patients in the age group could not be calculated. With both the traditional and the alternative method, the application of different age groups resulted in different adjusted survival estimates for all cancer types studied. With the use of balanced age groups, variation was strongly reduced. Table 3b summarises and compares the results obtained by calculating adjusted survival estimates with all ages involved and with truncated cancer patient populations. The truncation did not alter the crude survival estimates significantly: the differences between the crude and the truncated crude survival estimates were between 0.6 and 3.6% units. However, the variation in the adjusted survival estimates among the different categorisation schemes was strongly reduced for all cancer sites.

DISCUSSION
With the traditional method, the calculation of 5-year age-specific survival estimates often failed in age groups with a few patients only. Failures could mostly be overcome by the application of different age categorisation schemes, that is, by collapsing or balancing the age groups. When using the alternative method , calculability was generally very good, even with the fixed age group categorisation schemes. However, with both methods, the different age group classifications produced age-adjusted estimates with a rather large variability, mainly because of the assignment of large weights for the older age groups in which data were sparse in the ZNCR. These variations could be effectively reduced using balanced age groups and by restricting the analysis to a truncated age range up to 74 years.
There is no theoretically best practice with regard to the number of age groups, their width and the boundaries of the individual age group classifications. For practical purposes, however, adjusted estimates should be reasonably consistent, no matter what age classifications are used. Limitations in data quality frequently impair the reliability of age-specific data among older patients, particularly in case of patient populations from developing countries (Sankaranarayanan et al, 1998). In such cases, the calculation of truncated adjusted survival may provide estimates of improved reliability and comparability. On the other hand, truncation means that the survival experience of older patients is neglected, which would be justified only if the proportion of these patients is very small. The use of balanced age groups is not affected by this limitation and may be preferred if the exclusion of older patients is of concern.
In looking at the results, the following limitations should be kept in mind. Our empirical evaluation is based on five cancer sites from one cancer registry only. The cancer sites were chosen to represent various sample sizes, age distributions, and higher and lower survival patterns, and therefore reflect a variety of scenarios encountered in comparative analyses of survival. While the problems of sparseness of data and discrepancy between the age distribution of the study population and the standard population were probably more extreme than in most other practical applications, such extreme data situations may facilitate the demonstration of the implications of various analysis strategies under the above conditions. We did not include standard errors of survival estimates obtained with the various methods. As recently demonstrated elsewhere (Brenner and Hakulinen, 2005), the alternative method often provides estimates with a smaller standard error.
In summary, our results on the one hand illustrate that the enhanced calculability of age-adjusted survival estimates by the alternative method may be relevant in practice. Nevertheless, the unreliability of estimates in case of sparse data within age groups may remain a concern for both the traditional and the alternative method. In such situations, the use of balanced age groups or the calculation of truncated age-adjusted survival estimates may be useful analytical options.