Do women ⩾50 years of age need as much screening as women <50 years after they have had negative screening results?

To assess the adequacy of a routine screening to identify cervical intraepithelial neoplasia 2 or worse (CIN2+) in women over 50 years of age, a retrospective cohort was set in six Italian organised population-based screening programmes. In all, 287 330 women (1 714 550 person-years of observation, 1110 cases) screened at age 25–64, with at least two cytological screening tests, the first negative, were followed from their first negative smear until a biopsy proven CIN2+ lesion or their last negative smear. For women aged 25–49 and 50–64 years, crude and age-standardised detection rate (DR), cumulative risk (CR), adjusted hazard risk for number of previous negative screens, probability of false-positive CIN2+ after two or more smear tests were calculated. Detection rate is significantly lower over 50 years of age. Multivariable analysis shows a significant protective effect from four screening episodes (DR=0.70, 95% CI: 0.51–0.97); the effect of age ⩾50 is 0.29 (95% CI: 0.24–0.35). The CR of CIN2+ is at least eightfold higher in women <50 (CR=2.06, 95% CI: 1.88–2.23) after one previous negative test than in women ⩾50 years with four screens (CR=0.23, 95% CI: 0.00–0.46). Over 50 years of age, after four tests at least three false-positive cases are diagnosed for every true positive. Benefits arising from cytological screening is uncertain in well-screened older women.

European recommendations state that cervical cancer screening should start at the latest at 30 years and stop not before 60 years of age. Screening intervals should be between 3 and 5 years (Coleman et al, 1993).
International Agency for Research on Cancer recommendations for cervical cancer screening suggest 5-yearly screening over 50 years of age; 3-yearly screening is recommended at age 25 -49 years in countries with the necessary resources (International Agency for Research on Cancer, 2005).
The overtreatment is intrinsic in cervical cancer screening: for preventing one invasive cancer, seven to eight preinvasive lesions should be treated (Ponten et al, 1995).
Considering invasive cervical cancer, several studies (International Agency for Research on Cancer, 1986;Gustafsson et al, 1995;Cruickshank et al, 1997;Coldman et al, 2005;Ronco et al, 2005) show a trend of decreasing risk with increasing number of smear tests.
The low risk level in well-screened women after 50 years of age might not justify the costs and harms associated with screening, such as overdiagnosis, anxiety, unnecessary treatment and reduction in quality of life associated with false-positive results (International Agency for Research on Cancer, 2005). On the basis of age and screening history, a subset of women could be identified, for whom the risk of preinvasive and invasive lesions is negligible and screening might be safely stopped before the currently recommended age limit (Sherlaw-Johnson et al, 1999) or different screening intervals might be adopted at different ages (International Agency for Research on Cancer, 2005).
The balance between the risk of being diagnosed with a preinvasive or invasive lesion and that of getting a false-positive cytohistological report must be discussed considering ethic and practical implications.
Objective of the present study is to estimate the DR and the cumulative risk (CR) of CIN2 or more severe lesions (CIN2 þ ) and to compare it with the probability of false-positive episodes in women o50 or X50 years of age, with at least two cytological screening tests, the first negative, as observed in six Italian organised population-based screening programmes.

Study design
The study is based on a retrospective cohort of women screened at age 25 -64 years, who had at least two cytological screening tests on 31 December 2000, the first being reported as negative. Women were recruited from six Italian organised population-based screening programmes (Ferrara, Florence, Mantova, Ravenna, Turin and Viterbo): women aged 25 -64 years are invited every 3 years by mail; monitoring system and quality assurance have been established according to the European guidelines.
Women were followed from the date of their first negative smear test available in the archives until a biopsy proven CIN2 þ lesion was diagnosed or until their last negative smear result (performed within 31 December 2000). We considered as a negative smear test all the smears either with a negative cytology or with a positive cytology followed by a negative biopsy proven CIN2 þ diagnosis performed within 300 days from the smear test, as women are readmitted to routine screening. Follow-up continued for women with a less severe than CIN2 þ histological diagnosis. The histological diagnosis was issued within 30 September 2001 following a smear test performed within 31 December 2000.
As different cytological classifications were used in screening centres, data were standardised according to a common cytological and histological classification (Bethesda NCI Workshop, 1993).
Negative smear tests performed within 300 days of the oldest available test for clinical or technical reasons were assumed as a unique screening episode, as recommended in screening protocols of the programmes that supplied data. To count the previous negative screening episodes, we individuated an index test for each woman. In a screening setting, the index test activates further assessments to ascertain the occurrence of the studied lesion; so we did not count it among the negative ones and we used it to define the time of exit from the cohort. For negative women, the last available negative smear was considered as the index test. However, for the cases, four different situations occurred: (1) the index test was the most recent smear performed within 300 days since the CIN2 þ histological report, (2) when two or more tests were performed within 300 days of the positive histological result, the one with the worst diagnosis was chosen, (3) if the last smear test was performed at more than 300 days from the histological diagnosis, this test was identified as the index if positive, (4) again, if the last smear test was performed at more than 300 days before the histological diagnosis but was negative, the date of the histological diagnosis was considered as the date of the index test.

Statistical analysis
Persons-years were computed from the date of the oldest negative smear test available in the screening history. For negative women, the index test was assumed as the end point of the observation period whereas for cases, the end point was the midpoint between the index test and the last available negative smear test, as we assumed that the lesion occurred on average in the midpoint of the interval between two tests.
Person-years were calculated according to time-dependent variables: the number of previous negative screening episodes (five categories: 1, 2, 3, 4 and 5 þ ) and 5-year age groups.
Crude and age-standardised (direct method) CIN2 þ DR was calculated as cases on person-years, CR, applying the exponential formula, and their confidence intervals at 95% (95% CI) were calculated for women of 25 -49 and 50 -64 years of age. It must be noted that the age intervals used to calculate the CR have different lengths. A Score test for trend of DRs by age class was computed (StataCorp, 2005).
For women with two or more smear tests, we estimated the joint probability of a false-positive CIN2 þ diagnosis, when both cytology and the histology are false positive. We used the formula: where V cyt and V his are the specificities of cytological and histological diagnosis and s is the number of screening episodes. We assumed the following combinations of cytological and histological specificities, respectively: 0.97 -0.94, 0.96 -0.95, 0.95 -0.96 and 0.94 -0.97 to calculate the number of false positives (FP). We estimated the probability of true-positive results (TP) as the probability of positive results (the CR per 10 000) minus the number of FP. The specificity values used are more favourable than the range of cytological and histological specificity shown in literature for HSIL/CIN2 þ (Nanda et al, 2000;Stoler and Schiffman, 2001), and they represent the minimum values that combined together allow to estimate the TP number. We estimated the ratio between TP and FP, and the ratio between CR and FP.
The effect of multiple factors on the risk of CIN2 þ was modelled using Cox regression for left-truncated and rightcensored data. We have considered five categories of the number of previous negative screens and age o50 or X50 years as timedependent variables, whereas the interval between the index test and the last negative smear has been considered as a fixed variable. Calendar time was used as time axis. We calculated P-values for trend for the number of previous negative screens as an ordinal variable. P-values less then 0.05 were considered statistically significant.

RESULTS
Screening data were available from the six programmes. Screening episodes, as defined in the Material and methods section, were 569 713. The study included 287 330 women for a total of 1 714 550 person-years of observation and 1110 CIN2 þ cases.
Distribution of the study population and DR (95% CI) by screening programmes is described in Table 1.
When comparing 25 -49 vs 50 -64 age groups (Table 3), DR was constantly lower in the latter in all subgroups defined according to the number of previous negative screens. A small, statistically not significant, protective effect was observed in women above 50 years of age with more than one previous negative smear test (Score test for trend of rates: P-value40.005).
Multivariable analysis when adjusting for screening programme, age o50 or X50 years and interval between index and last negative smear demonstrate a statistically significant protective effect of four or more previous negative tests, but the effect of age X50 years was much stronger (Table 4).
When comparing women o50 years of age with one previous negative test, the CR of CIN2 þ (Table 5) is at least fivefold higher than in older women with one test, and raising to about eight if comparing younger women with one screen to women of age X50 years with four tests.
When considering a cytological specificity of 0.97 and a histological specificity of 0.94 or the opposite combination, the cumulative probability of an FP CIN2 þ is about 90, resulting as a sum of FP in each screening episode ( Table 6). The ratio between TP and FP diagnosis is about 1 for women above 50 years of age who underwent one, two or three screening episodes, and it even reaches the value of 0.28 and 0.29 after four or five episodes, respectively (Table 6). However, in younger women this ratio is constantly above 10.44. The situation is similar when considering a cytological specificity of 0.96 and a histological specificity of 0.95 or its opposite combination. Actually, in this case, the cumulative probability of an FP CIN2 þ is about 100; the ratio between TP and FP in women below 50 years of age ranges from 9.30 to 14.34 (i.e., it is lower than with the other combination) but in women above 50 years of age, the ratio is always below 1 and it reaches the value of 0.16 after four or more screening episodes.

DISCUSSION
In women X50 years of age, the CR of CIN2 þ is significantly lower compared to younger women, irrespective of the number of previous negative screening episodes. Combining the effect of age and the effect of the number of previous negative smears in the case of a CIN2 þ diagnosis every 1 true-positive CIN2 þ we may expect between 1 and 6.3 false-positive cases in the worst scenario, according to the combination in Table 6.
The present study is based on a large multicentric cohort and a great amount of person-years; nevertheless a number of points in the study design need to be discussed.
Ideally, the protective effect of cervical screening should be estimated on the risk of invasive carcinoma. Nevertheless, we chose CIN2 þ as the outcome because in the whole cohort (287 330 women, 1110 CIN2 þ cases), we found few invasive carcinomas (n ¼ 61) only. Incidence of cervical cancer in the study population is not comparable to that of the general population as women are selected for having performed at least two cytological screening tests, the first negative. Moreover, screening protocol addresses women with CIN2 þ diagnosis to treatment.
Usually DR is calculated per smear as it is used to estimate a lesion prevalence, whereas here it was calculated as cases on   Figure 1 Detection rate (per 10 À4 person-years) of CIN2 þ by age class.
person-years to take into account the screening history and the effect of previous tests on diagnosis. We stratified the cohort choosing 50 years of age as cutoff. Even though the age groups have different lengths (25 and 15 years), this choice was based on previous studies showing a strong risk reduction for CIN2 þ after 50 years of age (Gustafsson et al, 1995;Gram et al, 1998; International Agency for Research on Cancer, 2005). Also, our results show that DR of CIN2 þ decreases significantly from 4.17 in the 45 -49 age group to about 2 after 50 years of age (Table 2). When stratifying for 5-years age classes and previous screening episodes, after two, the DR before 50 years is doubled compared to older women (data not shown). To verify that the use of two groups of 25 and 15 years length would not affect our study conclusions, we estimated CR using 25 -44 and 45 -64 age groups, and the results show that the risk in younger women is at least threefold higher than that in older ones (the ratio ranging from 3.3 to 6.3).
As shown in Table 6, one critical point in cervical screening is the specificity of cytohistological diagnosis.
We estimated the joint probability of a false-positive CIN2 þ diagnosis within a screening episode, as it depends on the probability of being a false-positive case at both the primary and the assessment test, for the adopted diagnostic category.
Several meta-analyses (Fahey et al, 1995;Nanda et al, 2000) have shown that cytological specificity ranges from 14 to 97% (Fahey et al, 1995). We assumed a specificity ranging from 0.94 to 0.97, according to estimates from studies on low-risk screening attenders (Nanda et al, 2000). Lower values did not allow the estimation of the true positives; anyway, it is very unlikely that a specificity as low as 0.60 -0.70 might occur in a modern cytological screening setting (International Agency for Research on Cancer, 2005).
Histological diagnosis reproducibility might be as questionable as cytological diagnosis. Studies of cervical biopsies have shown fair-to-poor interobserver and intraobserver agreement in reporting (Robertson et al, 1989). We assumed histological specificity to be at least 0.94 according to the findings of ASCUS-LSIL Triage study (Stoler and Schiffman, 2001), which reviewed 2237 original histological slides.
According to these assumptions, the ratio between true-positive and false-positive results is almost above 10 under 50 years of age, whereas among older women, for each real case identified, one false positive is also diagnosed. After four tests, at least three falsepositive cases are diagnosed every true positive (Table 6). These considerations hold also in the case of two 20-year age groups, that is 25 -44 and 45 -64 (data not shown). Yet, in our results, the effect is even stronger, thus demonstrating that the screening benefits over 50 years is uncertain.   False-positive results are associated with unnecessary assessment and its complications, adverse effects of treatment, unnecessary treatment, adverse effects of labelling or early diagnosis, anxiety and costs generated by investigations and treatment (International Agency for Research on Cancer, 2005). Hence an effort to increase specificity is needed, especially in older age groups.
A different CIN2 þ DR is reported among the participating centres, likely due to real incidence differences and not due to diagnostic variability among laboratories. Actually, data on reproducibility that are available for cytological diagnosis in Italy showed that agreement was generally good (Montanari et al, 2003).
As the cohort includes women coming from six different Italian screening programmes, with different periods of observation, the CIN2 þ DRs must be taken with caution. For this reason, we have performed an analysis on the Florence and Turin data only, as they contribute for 75% of the observed women. The estimates of the DRs, ratios and trends are very similar to those calculated on the whole cohort.
Centres with the highest CIN2 þ incidence (Ravenna, Ferrara and Mantova) gave a limited contribution to the cohort as to person-years but they provided a great number of cases; this could have reduced the stronger effect observed in a previous analysis (Armaroli et al, 2005). Thus, decision makers should take into account the local CIN2 þ prevalence when implementing local intervention strategies, as in single areas, the probability of being a case might be higher than that of being a false positive.
Women who undergo screening more frequently may have a lower risk of cervical disease because of a healthier lifestyle and a better access to treatment (Ronco et al, 1991). In the present study, DRs are low also because women were selected as having had at least one negative smear before the index test. Moreover, survival within the cohort is subordinated to not having shown a previous high-grade cervical lesion, as follow-up is stopped when a CIN2 þ lesion occurs, that is the probability of being positive to the last Pap smear is conditional to having accumulated previous negative test.
Detection rates are affected by the duration of sojourn time and by the proportion of lesions that regress. Moreover, the sensitivity of smear test for long-sojourn time lesions as CIN2 (International Agency for Research on Cancer, 1986) depends on the number of smear tests performed during the sojourn time; on the other hand, frequent testing increases the DR of those significant lesions. This last effect may explain the observed risk reduction when the number of screening episodes increases, particularly after four negative tests.
The results of the present study are in agreement with other studies showing a risk reduction with increasing age (Cruickshank et al, 1997;Sasieni et al, 2003;Coldman et al, 2005) and the number of previous negative smear tests (Coldman et al, 2005).
Few cases of preinvasive lesions were diagnosed ex novo in wellscreened women aged over 50 years (two or more 3-to 5-yearly negative screens) (Van Wijngaarden and Duncan, 1993) or in women with three consecutive (at most 3-yearly) negative screens before 50 years age (Cruickshank et al, 1997) or in women with at least three negative smear tests screens between 41 and 49 years of age (Gustafsson et al, 1995).
Considering invasive cervical cancer, a trend of decreasing risk with increasing number of smear tests is reported (International Agency for Research on Cancer, 1986;Gustafsson et al, 1995;Coldman et al, 2005).
In Turin (Italy), invasive carcinoma incidence during 1992 -1998 was reduced by 75% in women who attended one screening at least, as compared to nonattenders (3.0 vs 9.5/100 000 personyears), the latter showing the highest incidence .
The results of the present study, in accordance with other reports, may suggest that the adequacy of a routine screening test to identify early lesions in women over 50 years of age with at least four previous negative screens is questionable; possible alternative strategies may be explored. Women might be involved in the decision whether to stop screening or to undergo just another smear test in their life after evaluating the individual risk of a CIN2 þ at further screening through algorithms based on age, screening history and living area-specific DRs. Comparing the future individual risk of being a case or a false positive may support and strengthen individual choices. The decision of stopping screening may also be supported by a negative result of HPV testing.
Informing women about the risk related to changes in their and their partners' sexual habits (Brisson et al, 1994) may allow spontaneous return to the usual screening protocols.
Such strategies agree with the IARC recommendations for implementation on cervical cancer screening (International Agency for Research on Cancer, 2005) and with the IARC Working Group statement that there is little benefit from screening old women who have always tested negative in an organised screening programme. In particular, for women over 50 years of age, the Working Group recommend a 5-year screening interval.
The results of our study support the opinion that the benefit arising from cytological screening is uncertain in older women. Ethical and practical considerations subsequent to screening intensity reduction must be taken into account. A possible Table 6 Cumulative risk (CR), probabilities of cytological and histological false-positive diagnosis and of true positive per 10 000, ratio between truepositive and false-positive, ratio between CR and false-positive, by screening episodes, for different combination of cytological and histological specificities CR (per 10 000) Adequacy of cervical screening in older women P Armaroli et al consequence might be an increase in invasive lesion incidence, compared to a major resource saving. Estimates of unprevented cervical cancers are in the magnitude of about two cases per 100 000 person-years (Sherlaw-Johnson et al, 1999). It is thus desirable to evaluate if benefits arising from saved resource allocation to more cost-effective interventions would make acceptable to reduce or to stop screening in 50-year-old or older women with a negative documented screening history.