Main

Analyses of long-term survival are a key component of monitoring progress against cancer. Such monitoring should be as up-to-date as possible. However, with traditional cohort-based survival analyses, even the most up-to-date long-term cancer survival estimates pertain to patients diagnosed many years ago and followed up for many years since then. Period analysis, first introduced in 1996 (Brenner and Gefeller, 1996), enables in overcoming this problem by restricting the analysis to the survival experience in the most recent years for which the data of patients diagnosed in various calendar years are available. For example, a cancer registry that has, in 2012, complete data on cancer incidence and mortality follow-up up to the year 2010 might derive a cohort estimate of 5-year survival for patients diagnosed in 2001–2005. However, this estimate would essentially reflect the level of cancer care of 10 years ago. A better prediction of the survival of newly diagnosed cancer patients can be obtained by a period analysis for the 2006–2010 period by restricting the analysis to the survival experience (of patients diagnosed in 2001–2010) in 2006–2010. It has been demonstrated by extensive empirical investigation that period analysis provides better predictions of the survival of newly diagnosed patients than traditional cohort analysis (Brenner and Hakulinen, 2002a; Brenner et al, 2002b; Talback et al, 2004; Ellison, 2006). The method has therefore been increasingly used in cancer survival studies in recent years (e.g. Verdecchia et al, 2007; Sankaranarayanan et al, 2010; Coleman et al, 2011; van de Schans et al, 2011).

A question that requires specific attention in the reporting of results from the period analysis of cancer survival is the study population to which the survival estimates pertain to. This study population is commonly described with respect to sociodemographic and clinical characteristics in one or more introductory tables. For example, in a comparison of survival estimates from two countries, the study populations of the countries should be compared in an introductory table with respect to factors like age, sex, cancer type, and stage. Such a table will help in interpreting the findings of a study and enable a more detailed comparison of results across articles. In studies using cohort analysis, which refers to a specific cohort of patients, there is no doubt that this very cohort is the study population to be described. In studies using period analysis, the relevant study population is less obvious.

There appear to be at least two ‘natural candidates’ that have both been reported in studies including period analysis (Ellison et al, 2007; Gondos et al, 2007; Verdecchia et al, 2007; Quaglia et al, 2009; Redaniel et al, 2009; Pollack et al, 2011; Chen et al, 2012), showing that the investigation of the appropriate study population is a relevant issue in enhancing comparability across studies. The first one consists of all patients who potentially contribute some data to the survival analysis (‘full cohort’). For illustration, consider a period analysis for the period 1999–2003, as illustrated in Figure 1 (bold frame). In this example, the full cohort would include all patients diagnosed in 1994–2003, even though some of them make only very limited or no contributions to the survival analysis. For example, patients diagnosed in 1994 would only contribute some survival experience in the 5th year of follow-up if they were still alive in 1999. In case of improvements in survival over time, survival of the 1994–2003 cohort might be substantially lower than survival of the 1999–2003 cohort, whose survival expectation is aimed to be approximated by period analysis. The second ‘natural candidates’ for the relevant study population to be described would include patients diagnosed in the period of interest (here: 1999–2003) only (‘restricted cohort’). Although survival of the cohort diagnosed in a certain period is often well approximated by period analysis for this period, survival of this cohort may typically end up being somewhat better than that estimated by period analysis in case of ongoing improvement in survival (Brenner and Hakulinen, 2002a; Brenner et al, 2002b). Therefore, period estimates of survival may often lie between the survival expectations of the full and the restricted cohort.

Figure 1
figure 1

Years of diagnosis and years of follow-up included in various types of 5-year survival analyses. (i) period analyses for 1999–2003: data in bold frame. (ii) full cohort 1994–2003: all grey-shaded cells. (iii) restricted cohort 1999–2003: dark grey-shaded cells only. The numbers within cells indicate years since diagnosis.

In this study, we aim to assess which study population should typically be described along with the presentation of period survival estimates by investigating whether the full or restricted cohort has a survival experience that is closer to the period survival estimate.

Materials and methods

Our analysis is based on data from the Finnish Cancer Registry that is well known for its very long time series of high quality cancer registration. In Finland, cancer registration is mandatory by law, and both registration and mortality follow-up are virtually complete. For this analysis, patients registered in 1954–2003, that is, within half a century, with 1 of the 23 most common forms of cancer (those with on average >100 cases per year) and followed with respect to mortality by the end of 2008 were included. For each type of cancer, we first derived a period estimate of 5-year survival for the time period 1999–2003 (from the data included in the bold frame in Figure 1) and compared it with 5-year survival of the corresponding full cohort (diagnosed in 1994–2003, derived from data included in all grey-shaded cells in Figure 1) and of the restricted cohort (diagnosed in 1999–2003, derived from data included in dark grey-shaded cells only in Figure 1). We then extended this type of comparison to each of the nine 5-year time windows from 1959–1963 to 1999–2003, and compared the period estimates of 5-year survival for those periods with 5-year survival for the corresponding full cohorts (1954–1963 to 1994–2003) and for the corresponding restricted cohorts (1959–1963 to 1999–2003). We calculated mean differences and mean squared differences of 5-year period survival estimates from 5-year survival of both cohorts. Mean differences quantify to what extent the cohorts on average have lower or higher survival compared with the period survival estimates. Mean squared differences reflect average absolute differences and also take differences in random variation into account. Random variation should typically be smaller (due to larger numbers of patients) for survival estimates for the full cohorts than for survival estimates for the restricted cohorts. In additional analysis, we also computed the mean absolute differences to estimate absolute differences without taking random variation into account.

With improving prognosis over time for many forms of cancer, survival rates over longer time periods, such as 10 years, become of increasing interest, and the advantages of period analysis over traditional cohort-based survival analysis in terms of better predictions of survival are even larger for such longer follow-up times (Brenner and Hakulinen, 2002b; Brenner et al, 2002b). We therefore repeated the analyses for 10-year survival, starting with the period 1994–1998 as outlined in Figure 2, and proceeding with all seven 5-year periods from 1964–1968 to 1994–1998 (with full and restricted cohorts from 1954–1968 to 1984–1998 and 1964–1968 to 1994–1998, respectively). The more restricted range of time periods included for the analyses of 10-year survival is due to the fact that the 1964–1968 period is the first time period for which a 10-year period survival estimate and a full cohort estimate of 10-year survival can be calculated with the database including patients from 1954 on. Likewise, the 1994–1998 period is the most recent period for which 10-year survival can be calculated for the corresponding full and restricted cohorts with the database including follow-up until the end of 2008.

Figure 2
figure 2

Years of diagnosis and years of follow-up included in various types of 10-year survival analyses. (i) period analyses for 1994–1998: data in bold frame. (ii) full cohort 1984–1998: all grey-shaded cells. (iii) restricted cohort 1994–1998: dark grey-shaded cells only. The numbers within cells indicate years since diagnosis.

To investigate whether the appropriate study population for introductory tables may be different in case of even longer survival times, the analysis was repeated for 15- and 20-year survival computed on five 5-year periods from 1969–1973 to 1989–1993 and on three 5-year periods from 1974–1978 to 1984–1988, respectively. To investigate whether results change when longer or short calendar periods of interest are used, the analysis on 5- and 10-year survival was repeated on 2-year and 10-year calendar periods. For the 2-year calendar period, 5-year relative survival was estimated from 22 periods (1960–1961 to 2002–2003) and 10-year relative survival was estimates from 17 periods (1965–1966 to 1997–1998). For 10-year calendar periods, computations were based on four (1964–1973 to 1994–2003) and three (1969–1978 to 1989–1998) periods.

According to standard practice in population-based cancer registration, relative rather than observed survival was assessed. Relative survival was calculated as the ratio of observed survival of cancer patients divided by expected survival in the general population (Henson and Ries, 1995). The latter was calculated from period-specific population life tables according to the so-called Ederer II method (Ederer and Heise, 1959; Hakulinen et al, 2011). Owing to major changes in the age distribution of cancer cases over time, all survival analyses were age-adjusted to the International Standard Cancer Populations proposed by Corazziari et al (2004). All analyses were performed by the statistical software system SAS, version 9.2, using publicly available macros for cohort and period analysis of relative survival (Brenner et al, 2002a).

Results

Table 1 shows the development of 5-year relative survival for the 23 most common forms of cancer within half a century from 1954–2003. For all included cancers, a total number of more than 5000 cases were diagnosed, that is, the mean annual number of diagnoses was >100. With >90 000 cases each, the most common cancers were cancers of the lung and breast, followed by colorectal, prostate and stomach cancer with 60 000 cases each. Case numbers were 20 000 for cancers of the pancreas, bladder, corpus uteri, kidney, oral cavity, and ovaries as well as for non-Hodgkin lymphoma, leukaemia, and skin melanoma, and between 5000 and 15 000 cases for all other cancers.

Table 1 The 23 most common forms of cancer in Finland in 1954–2003: total number of cases (N) and age-adjusted 5-year relative survival for the earliest (1954–1958) and the most recently diagnosed cohort (1999–2003) of patients

For all cancers except cancer of the oral cavity, 5-year relative survival was <50% among patients diagnosed in 1954–1958. For the majority of cancers, substantial increases in age-adjusted 5-year relative survival were observed during this 50-year time interval. With increases of 64.3, 55.4, 49.2, 45.8, 44.2 and 43.5 per cent units, increases were strongest and exceeded 0.8 per cent units per year on average for prostate cancer, Hodgkin lymphoma, bladder, skin melanoma, kidney, and thyroid cancer. By contrast, no or only very minor increases (<10 % units overall) were seen for cancers of the oral cavity, pancreas, gallbladder, liver, lung, and oesophagus. For the latter five cancers, age-adjusted 5-year relative survival remained <12% even for cases diagnosed in 1999–2003. Average differences between the most recent and earliest period across cancer sites showed that 1-year survival increased by 21.5 % units (data not shown). With respect to temporal trends in conditional survival, 4-year survival for patients who have already survived 1 year increased on average by a comparable amount (21.8 % units). One-year survival conditional on 1-, 2-, 3-, and 4-year survival increased by 12.8, 12.7, 6.3, and 6.8 % units.

Table 2 shows 5-year relative survival for the 1999–2003 period compared with 5-year relative survival of the corresponding ‘full cohorts’ and ‘restricted cohorts’ of patients diagnosed in 1994–2003 and 1999–2003, respectively. With only one exception (ovarian cancer), 5-year relative survival of the full cohorts was very close to the period estimate of 5-year relative survival. Differences were <2 % units for 22 of 23 cancers and <1 % unit for 16 cancers. For a majority of 17 cancers, larger differences were seen between 5-year relative survival of restricted cohorts and the period estimates. Latter differences exceeded 2 % units for five cancers, and were particularly large for prostate cancer, Hodgkin and non-Hodgkin lymphoma and breast cancer (+6.21, 4.47, 3.78, and 3.03 % units), respectively, that is, cancers for which major recent increases in survival were observed. For all but six cancers, 5-year relative survival of the restricted cohorts of patients diagnosed in 1999–2003 exceeded period estimates of 5-year relative survival for the 1999–2003 period.

Table 2 Five-year relative survival for the 1999–2003 period compared to 5-year relative survival of patients diagnosed in 1994–2003 (‘full cohort’) and patients diagnosed in 1999–2003 (‘restricted cohort’)

Regarding period estimates of 10-year relative survival in 1994–1998, differences from survival of both the full (diagnosed in 1984–1998) and restricted cohorts (diagnosed in 1994–1998) were somewhat larger and exceeded 2 % units for 9 out of 23 cancers in each case (Table 3). For a majority of 13 cancers, the 10-year relative survival of the restricted cohort exceeded the period estimates. By far, the largest difference (+17.3% units) was seen for prostate cancer, with 10-year relative survival of 62.9% for the restricted cohort, compared with the period estimate of 45.6%. By contrast, 10-year relative survival of the full cohorts was lower than the period estimates for a clear majority of 19 out of 23 cancers. The full and restricted cohort estimates were closer to the period estimates for 10 and 13 cancers, respectively. Differences between the period estimate and the full and restricted cohort estimate, respectively, were not related to the change in survival within the investigated period from 1984 to 1998.

Table 3 Ten-year relative survival for the 1994–1998 period compared to 10-year relative survival of patients diagnosed in 1984–1998 (‘full cohort’) and patients diagnosed in 1994–1998 (‘restricted cohort’)

Summary results of the comparisons of the period estimates of 5- and 10-year relative survival with survival of the corresponding full and restricted cohorts computed on 5-year calendar periods during the 50 calendar years (1954–2003) included in this analysis are shown in Table 4. Supplementary Figure 1 shows the mean and mean squared differences between the survival estimates according to the improvements in survival over time. Five-year relative survival of full cohorts was on average closer to the period estimates than 5-year relative survival of the restricted cohorts for a majority of 16 out of 23 cancers. The full cohort was especially closer for cancer sites showing strong increases in survival between 1954 and 2003. For 10 out of 11 cancer sites with increases >30 % units, the full cohort estimate was closer. Absolute values of mean differences were <1 % unit in 19 out of 23 cancers. Whereas 5-year relative survival of full cohorts was on average lower than the period 5-year relative survival estimate for all cancers, 5-year relative survival of the restricted cohorts was on average higher than the period estimates for all but two cancers (oral and gallbladder cancer), which showed minor or no improvements in survival. Mean squared differences in 5-year relative survival were lower for the full cohorts than for the restricted cohorts for 19 out of 23 cancers. Again, this relationship was stronger for cancer sites with increases >30 % units between 1954 and 2003.

Table 4 Mean difference and mean squared difference of full cohort and restricted cohort 5- and 10-year relative survival from period 5- and 10-year relative survival computed on 5-year calendar periods, respectively

Regarding 10-year relative survival, differences between period estimates and full cohort and restricted cohort estimates were on average somewhat larger. With three exceptions (gallbladder, prostate, and brain and nervous system cancer), the full cohort estimates were on average lower than the period estimates, whereas the restricted cohort estimates were on average higher than the period estimates for all cancers except oral cavity cancer. Absolute values of mean differences from period estimates were of similar magnitude for full and restricted cohort estimates and for the restricted cohort on average larger for cancer sites with stronger improvements in survival over time. However, mean squared differences were lower for full cohort estimates for a majority of 18 out of 23 cancers.

Comparisons of the period estimates of 15- and 20-year relative survival with survival of the corresponding full and restricted cohorts computed on 5-year calendar periods during the 50 calendar years are shown in Supplementary Table S1. Like for 5-year survival, the absolute mean differences from period estimates were on average lower for full than restricted cohort estimates for a majority of cancer sites (15-years: 16 cancer sites; 20-years: 20 cancer sites). Mean squared differences were lower for the full than for the restricted cohort for 19 out of 23 cancers for both 15- and 20-year survival. No strong relationship between the improvements in survival during the period of interest and the difference between the cohort estimates and the period estimates were observed.

Summary results of the comparison of 5- and 10-year period survival estimates with full and restricted cohort estimates computed on 2- and 10-year calendar periods are shown in Supplementary Tables S2 and S3. For 2-year calendar periods (Supplementary Table S2), 5-year and 10-year period estimates were on average closer to full than restricted cohort estimates for 17 and 14 out of 23 cancer sites, respectively. Absolute mean squared differences were on average lower for the full than for the restricted cohort for 22 cancers for 5-year survival and 20 cancers for 10-year survival. For 10-year calendar periods (Supplementary Table S3), 5-year period estimates were again on average closer to full than restricted cohort estimates (17 out of 23 cancer sites). In contrast, 10-year period estimates were on average closer to restricted cohort estimates for 14 of 23 cancer sites. However, mean squared differences were on average lower for full cohort estimates for 5-year (16 out of 23 cancers) and 10-year (15 out of 23 cancers) survival. In general, 5-year period survival estimates were especially closer to full than restricted cohort estimates in case of strong improvements in survival during the period of interest, irrespective of the length of the period. This pattern was not observed for 10-year survival.

We repeated all analyses computing mean absolute differences between the period survival estimates and the full and restricted cohort estimates. In all scenarios, the mean absolute differences between the full cohort and the period survival estimate were smaller than the absolute differences between the restricted and the period survival estimate for the majority of cancer sites (data not shown).

Finally, we investigated whether the magnitude of the difference between the period survival estimates and the full and restricted cohort estimates depended on the size of the period estimate (data not shown). The differences between period survival estimates and restricted cohort estimates were larger for higher period survival estimates. As a consequence, in particular for cancer sites with longer survival, the full cohort estimates were closer to the period estimate than the restricted cohort estimates.

Discussion

As has been shown by extensive empirical investigations, period analysis provides better predictions of survival of recently diagnosed patients than traditional cohort analysis (Brenner and Hakulinen, 2002a; Brenner et al, 2002b; Talback et al, 2004; Ellison, 2006) and, therefore, has been increasingly used in cancer survival studies in recent years (e.g. Verdecchia et al, 2007; Sankaranarayanan et al, 2010; Coleman et al, 2011; van de Schans et al, 2011). Until now, it has not yet been investigated which study population pertains best to the period survival estimates and, thus, should be described with respect to the sociodemographic and clinical characteristics in an introductory table. As a consequence, reporting of the underlying population in period analyses has not been standardized, which hampers comparability across studies.

We investigated the use of two ‘natural candidates’ of study populations for period analysis: the full cohort, including all patients who potentially contributed some data to the survival analysis, and the restricted cohort, including only patients diagnosed in the period of interest. When survival is constant over time, survival of both cohorts is the same. Fortunately, however (as also observed for many cancers in Finland), survival is increasing over time for many cancers (Gondos et al, 2009; Verdecchia et al, 2009; Storm et al, 2010), in which case, survival of the full cohorts is typically slightly lower, and survival of the restricted cohort is typically slightly higher than the survival estimated by period analysis. This pattern was observed in our results when information from various periods was collapsed. However, for 5- and 10-year survival in the most recent time period, this relationship was only observed for some cancer sites, which might be explained by smaller improvements in cancer survival in recent years and by random errors in the survival estimates.

In published articles, the full (Ellison et al, 2007; Verdecchia et al, 2007; Redaniel et al, 2009; Chen et al, 2012) as well as the restricted cohort (Gondos et al, 2007; Quaglia et al, 2009; Pollack et al, 2011) have often been described showing that the investigation of the appropriate study population is a relevant issue. Our results show that in most of the investigated situations (5-, 10-, 15-, and 20-year relative survival computed on 5-year calendar periods, 5-year survival on 2-year and 10-year calendar periods and 10-year survival on 2-year calendar periods) survival estimates for the full cohort were mostly closer to the period estimates than the survival estimates for the restricted cohort. Results for 10-year survival estimates on 5-year and 10-year calendar periods were less obvious with respect to the mean difference. However, in these situations, the mean squared differences, which quantify not only over- or underestimations but also the degree of random variation, were mostly smaller when comparing the period survival estimates with the survival estimates obtained from the full cohorts. The mean squared difference can be expected to be smaller for the full cohort, even in case of comparable differences, as the restricted cohort is a subset of the full cohort. However, as also the mean absolute difference was smaller for the full than the restricted cohort in these settings, our results nonetheless suggest that the survival estimate for the full cohort, consisting of all patients who potentially contributed some data to the survival analysis, is overall closer to the period survival estimate than the restricted cohort estimate and, thus, this cohort should be described as study population in an introductory table when period analysis is used.

Agreement of the period survival estimate with the survival of the various types of cohorts is of course just one of several possible criteria for selecting the cohort of patients to be described. We believe, however, that it is a particularly relevant criterion, especially when the description of the patient cohort includes distributions of factors closely related to survival, such as stage, cancer subsite, or histopathology, or indicators of data quality of potential relevance to survival estimates, such as the proportion of cases notified by death certificate only (Brenner and Holleczek, 2011). We used the mean, mean squared, and absolute difference as criteria to measure the similarity of survival estimates. Evaluation of other criteria, like the frequency of the direction of deviation, could be helpful to get further insights in the relationship between period survival and full cohort and restricted cohort estimates. A limitation of the mean squared difference as similarity measure is that it might be confounded by the heterogeneity across calendar years.

The finding that the full cohorts typically had somewhat lower survival, whereas the restricted cohorts typically had somewhat higher survival than the period survival estimates might suggest that cohorts that include some, but not all of the years of diagnosis prior to the period of interest included in the full cohorts might even be preferable over the full cohorts. An alternative and theoretically plausible cohort would consist of all patients who contribute data to the survival analysis. This would correspond to all patients of the full cohort who do not experience the event and are not censored before the start of the period window. An advantage of using this cohort instead of the full or restricted cohort is that the person-time and number of events in this cohort would be directly related to the data used in the period analysis. However, we feel that communicating the rationale for the use of such cohorts would be much less straightforward than for the full cohorts, and that this disadvantage would outweigh any possible minor benefits in the fit of the survival expectations.

A disadvantage of the recommendation to describe the full cohort instead of the restricted cohort is that the definition of the cohort, which should be described, depends on the length of the survival estimate, which is investigated. For example, if 5- and 10-year period survival estimates for the period 1999–2003 are computed in the same study, the underlying population will include all patients diagnosed in 1994–2003 and 1989–2003, respectively. In these situations, it might be preferable to describe the restricted cohort (1999–2003), as this cohort is commonly defined for both analyses. On the other hand, description of the full cohort is also common standard in other studies even if certain parts of the analysis pertain to defined subcohorts only. Furthermore, if an investigator also wants to include the number of person-years and cases included in the period analysis, these will also be contributed by the full cohort.

Our study has specific strengths and limitations. Strengths include the use of high-quality data from the Finnish Cancer Registry with a very long time series, including cancer diagnosis over half a century and the comprehensive investigation of the appropriate study population for various cancer sites, survival times, and lengths of calendar periods. The use of an empirical instead of a theoretical approach for the investigation of the appropriate study population might be considered to be a limitation of this study. Although an empirical investigation closely resembles later analysis, on which the results should be applied on, results on the difference between period and full and restricted cohort estimates might be affected by chance. We used various cancer sites and computed the difference for each possible 5-year calendar period to reduce such biases. However, results on mean differences for 10-year survival computed on 10-year periods and 20-year survival computed on 5-year periods were based on three calendar periods only. Nonetheless, our results showed a consistent pattern across various analysis situations.

In conclusion, due to the increasing use of period analysis it has become important to investigate which study population should be reported in an introductory table to standardise the reporting of results and, thus, enhance comparability across studies. Our analysis suggests that generally all patients who potentially contributed some data to the survival analysis should be included in the study base for the description of patients’ characteristics.