Immunochemical faecal occult blood test: number of samples and positivity cutoff. What is the best strategy for colorectal cancer screening?

Immunochemical faecal occult blood tests have shown a greater sensitivity than guaiac test in colorectal cancer screening, but optimal number of samples and cutoff have still to be defined. The aim of this multicentric study was to evaluate the performance of immunochemical-based screening strategies according to different positivity thresholds (80, 100, 120 ng ml−1) and single vs double sampling (one, at least one, or both positive samples) using 1-day sample with cutoff at 100 ng ml−1 as the reference strategy. A total of 20 596 subjects aged 50–69 years were enrolled from Italian population-based screening programmes. Positivity rate was 4.5% for reference strategy and 8.0 and 2.0% for the most sensitive and the most specific strategy, respectively. Cancer detection rate of reference strategy was 2.8‰, and ranged between 2.1 and 3.4‰ in other strategies; reference strategy detected 15.6‰ advanced adenomas (range=10.0–22.5‰). The number needed to scope to find a cancer or an advanced adenoma was lower than 2 (1.5–1.7) for the most specific strategies, whereas it was 2.4–2.7, according to different thresholds, for the most sensitive ones. Different strategies seem to have a greater impact on adenomas rather than on cancer detection rate. The study provides information when deciding screening protocols and to adapt them to local resources.

Faecal occult blood testing (FOBT)-based screening has been proven effective in reducing mortality from CRC (Hewitson et al, 2007). Recently, immunochemical tests (IFOBT) have been shown to be more sensitive than classic guaiac testing, and, as IFOBTs are specific for human haemoglobin (Hb), they do not require dietary restrictions, thus potentially improving screening acceptability. Moreover, some studies (Castiglione et al, 2000;van Rossum et al, 2008) suggested that 1-day IFOBT is more accurate (in terms of sensitivity/specificity ratio) compared to 3-day guaiac testing. Single sampling would enhance screening feasibility, and likely compliance, which has always been a critical factor (Vernon, 1997;van Rossum et al, 2008). Based on these considerations, latex agglutination test (LAT) has been adopted as 1-day testing in the Florence screening programme (Castiglione et al, 2000), using a positivity threshold of 100 ng ml À1 Hb of sample solution; the same strategy has been then adopted by all Italian FOBT-based screening programmes (Zorzi et al, 2008).
Quantitative IFOBT assays (such as LAT), which allow the adjustment of positivity thresholds to fit screening aims are now available. Earlier studies (Itoh et al, 1996;Nakama et al, 1999Nakama et al, , 2000Castiglione et al, 2002;Edwards, 2005;Chen et al, 2007;Levi et al, 2007) have evaluated the impact of using different IFOBT positivity thresholds and single vs multiple sampling in terms of sensitivity/ specificity ratio. The aim of this study is to provide further information on the diagnostic accuracy and colonoscopy workload of LAT-based screening using different positivity thresholds (X80, X100 and X120 ng ml À1 ), and single vs multiple samplings (1 vs 2 days).

MATERIALS AND METHODS
This multicentric study involved four CRC population-based screening programmes of Northern (Alto Vicentino, Bussolengo and Feltre Local Health Units) and Central Italy (Florence ISPO Cancer Prevention and Research Institute), currently inviting residents aged 50 -69 years to biennial IFOBT screening. The main characteristics of Italian screening programmes have already been described elsewhere (Zorzi et al, 2008). For this study, a special screening protocol was implemented. The study was submitted and approved by the Committees for Ethics of involved screening programmes. Participation to the study was offered to all subjects of target population who lived in those areas selected for the study.
Subjects received a dedicated invitation letter, which invited them to screening and explained the aims of the study. An informed consent was required to all attending subjects who accepted to participate in the study. Attendees received two test tubes for collecting faeces from two consecutive bowel movements. People who refused to participate in the study could benefit from routine screening (one sample with positivity threshold at 100 ng ml À1 ). Tubes were marked to identify the first and the second samplings. Samples were collected and stored at 41C until test development, usually within a week after collection.
Latex agglutination test assay (OC-Hemodia, Eiken, Tokyo, Japan, distributed by Alfa Wassermann, Milano, Italy) was employed. Quantitative assessment of human Hb was obtained through an antigen -antibody agglutination reaction using human anti-Hb polyclonal antibodies adsorbed on polystyrene particles: agglutination was measured as a 660-nm absorbency increase, proportional to Hb content in tested samples. The OC-Hemodia assay was developed by means of the OC-Sensor Micro instruments (Eiken, Tokyo, Japan), supplied by the distributor. Every screening programme had a reference laboratory where tests were processed by an own dedicated instrument. All the laboratories involved in the study cooperated in a national network for interlaboratory quality control, with the aim of monitoring the reproducibility of the FOBT technique.
Participants were instructed to return the test as soon as possible to avoid degradation of Hb in the samples. If the test could not be returned immediately, storage in a domestic refrigerator was advised.
Subjects with 479 ng ml À1 Hb, in at least one sample, were recalled for colonoscopy assessment, whereas subjects with o80 ng ml À1 Hb received a negative report by mail, recommending biennial rescreening. Incomplete colonoscopy prompted double contrast X-ray barium enema. Inadequate samples occurred in the absence of faecal material. No cutoff based on storage duration was used to define test as non-analysable. Subjects returning only one adequate sampling were excluded from the study. Subjects returning two inadequate samplings were advised to repeat the sampling.
The following screening strategies were compared: one sample with three different positivity thresholds (X80, X100 and X120 ng ml À1 ), and two samples with the same three thresholds, with at least one or both samples positive. One-day testing was determined as the result of the second sampling to simulate the actual exposure to degradation of Hb in a 1-day strategy.
Advanced adenoma was defined as any adenoma larger than 9 mm, and/or with a villous histological component higher than 20%, and/or with severe dysplasia. When more than one lesion was present, the subject was classified according to the most severe lesion. For this study, 'significant neoplasia' was defined, and is henceforth referred to as the sum of cancers and advanced adenomas. Performance was assessed according to positivity rate (PR ¼ proportion of positive FOBT among subjects returning the samples), detection rate (DR ¼ cancers or significant neoplasia detected among 1000 screened subjects) and positive predictive value (PPV ¼ cancers or significant neoplasia detected among 100 FOBT þ colonoscopy assessments). The number needed to scope (NTS) was calculated as the number of FOBT þ colonoscopies needed to find one person with cancer or with significant neoplasia. Screening strategies were compared with the one presently adopted by Italian screening programmes (1-day testing at X100 ng ml À1 ), henceforth referred to as 'reference strategy'.

Statistical analysis
Differences between tests were checked by w 2 method, significance being set at Po0.05. Differences between means were checked by the Student's t-test and analysis of variance (ANOVA). Confidences intervals at 95% (95% CI) of PR, DR and PPV were calculated using binomial distribution. To take into account multitesting, statistical comparisons among different strategies were carried out assuming the Bonferroni correction (Greenland and Rothman, 1998). Statistical analysis was performed using STATA 8.2 SE software.

RESULTS
From September 2005 to June 2007, 20 596 subjects aged 50 -69 years in their first screening round were enrolled in the study. Twenty-two subjects returned only one FOBT sample were excluded from the study. Seventeen subjects returned two inadequate samplings were advised to repeat sampling. Out of them, 12 returned a double adequate sampling and they were included in the study. Two subjects returned again two inadequate samplings, whereas three subjects did not repeat any sampling. These latter five subjects were excluded from the study.
The principal characteristics of the study cohort are shown in Table 1. Mean attendance rate to the study was 56.2%, ranging from 49.1 to 77.9%. Mean acceptance rate to participate in the study for screening attendees was 96.1%.
Overall, the mean age was 59.9 years (s.d. ¼ 5.6), with no difference between programmes, apart from Florence (mean 61.0 years, s.d. ¼ 5.3), where a higher proportion of subjects aged 60 -69 years were enrolled. Overall, a slight prevalence of women (53.8 vs 46.2%) was observed, even more so in the Florence programme (women ¼ 56.5%). Positivity rate varied among centres (range ¼ 6.7 -9.2%) at a significant level (Po0.05). Average compliance to colonoscopy was 89.0%, ranging from 96.2% in Alto Vicentino to 83.5% in Florence. Average completed colonoscopy rate was 95.2%.
Positivity rate according to the different screening strategies is reported in Table 2. As expected, 1-day strategies with the lower (X80 ng ml À1 ) or higher (X120 ng ml À1 ) threshold were associated with a slight increase ( þ 0.9%) or decrease (À0.5%) of PR, compared to the reference strategy, respectively. Positivity rates of 2-day strategies with at least one positive test were notably higher than the reference strategy, differences ranging from þ 1.4% (threshold X120 ng ml À1 ) to þ 3.5% (threshold X80 ng ml À1 ). Two-day strategies with both positive test results significantly decreased PR. All differences among each strategy and the reference strategy were statistically significant at Po0.00625 (threshold for significance after Bonferroni correction), except strategy of 1-day and cutoff X120 ng ml À1 (P ¼ 0.01).
Overall, 69 cancers and 465 advanced adenomas were diagnosed. Cancer stage was I in 41 (59.4%), II in 13 (18.8%), III -IV in 15 (21.7%). No statistically significant difference in stage distribution was found among different strategies (P-values from 0.80 to 0.99), but the study was not designed and specifically not powered to find existing differences.
Detection rate for cancer and significant neoplasia are shown in Table 3. The reference strategy detected 2.8% cancers. Compared to the reference strategy, 2-day strategies with at least one positive result had a slightly higher DR for cancer (from þ 0.5 to þ 0.6%, according to the positivity cutoff), whereas the more specific strategies (2-day with both positive results) had a slightly lower DR (from À0.5 to À0.6%). No observed difference in DR reached statistical significance (P ¼ 0.20 -0.85).
The reference strategy detected 18.4% significant neoplasia, and more substantial differences in significant neoplasia DR were observed with respect to other strategies than for cancer DR. The most sensitive strategy (2-day with at least one positive sample at X80 ng ml À1 ) allowed for an incremental DR of 7.5%, whereas the most specific strategy (2-day with both positive samples at X120 ng ml À1 ) decreased the DR by 6.3%. After Bonferroni correction, all the observed differences in significant neoplasia DR among the reference strategy and strategies based either on both positive samples or at least one positive sample were statistically significant with the only exception of 1-day strategy at X120 ng ml À1 (P ¼ 0.02). Table 4A shows PPV for cancer and significant neoplasia with different strategies. The PPV of the reference strategy for cancer was 6.9%. All differences among each strategy and the reference strategy for cancer were not statistically significant at Po0.00625 (threshold for significance after Bonferroni correction) with P-values 0.00630 -0.59.
The PPV of the reference strategy for significant neoplasia was 45.8%. After Bonferroni correction, all the observed differences in significant neoplasia PPV among the reference strategy and strategies based either on both positive samples or at least one positive sample were statistically significant with the only exception of 1-day strategy at X120 ng ml À1 (P ¼ 0.0485).

DISCUSSION
In recent years, a general agreement that IFOBT is superior to guaiac testing for screening purpose (Kahi et al, 2008;van Rossum et al, 2008) has grown. The IFOBT also allows for quantitative assay, and the positivity threshold may be adjusted to fit the local clinical and/or economic setting. This issue has been addressed in earlier studies (Nakama et al, 2000;Vilkin et al, 2005;Chen et al, 2007;Levi et al, 2007;Guittet and Launoy, 2008). In this study, we focused on the balance between cancer and advanced adenoma DR and colonoscopy workload by comparing different strategies using different number of samples and positivity thresholds, as assessed in a population-based screening service setting. To the best of our knowledge, this is the first study that simultaneously evaluated both these parameters in a large screening population sample. Some slight differences in the distribution by age and sex were observed among centres in the study. A higher proportion of older (60 -69 years) women in Florence compared to other centres is not due to difference in age distribution of the invited population, though a greater attendance in older women might be explained by the fact that in Florence programmes for cervical and breast cancer were activated many years ago and justifies higher awareness of older women usually less complying to screening invitations.
Differences occurred among the four centres as far as PR and DR are concerned. Such differences derived from the abovementioned different composition for age and sex of examined population and probably from a different underlying incidence. Nevertheless, it does not affect the validity of the study. In fact, the aim of this study was to compare the performance of different FOBT strategies. If we consider the rate ratio (RR) between the most sensitive for significant neoplasia (subject invited to colonoscopy if at least one out of two sample was higher than 80 ng ml À1 ) and the less sensitive strategy (subject invited to colonoscopy if both samples resulted higher than 120 ng ml À1 ), we can observe that this ratio resulted quite stable among the four centres and without statistically significant differences. In fact, the RRs (on brackets the 95% CI) were 2.0 (1.5 -2.5), 2.2 (1.7 -2.7), 2.3 (1.7 -3.3) and 2.3 (1.4 -3.8) for Florence, Bussolengo, Alto Vicentino and Feltre, respectively.
It means that the results we can obtain for the overall study are substantially valid also within each centre.
From a public health point of view, the choice of the best screening test is a critical issue, which must take into account the existing healthcare organisation and local resources. So far, different choices have been adopted in countries where a CRC screening programme has been (or is going to be) implemented on a national basis. In Japan, the adopted strategy was biennial 2-day I-FOBT (Monoheam) at X150 ng ml À1 , based on the studies of Nakama et al (2000Nakama et al ( , 2001 suggesting that this strategy allows for the best cost/effectiveness ratio. In the UK pilot study (UK Colorectal Cancer Screening Pilot Group, 2004), a specific strategy was adopted with guaiac testing without rehydration. In Italy, a biennial strategy based on 1-day LAT at X100 ng ml À1 has been adopted (Zorzi et al, 2008). This choice was based on several studies (Castiglione et al, 2000(Castiglione et al, , 2002 and on an estimate of programme sensitivity for cancer, which confirmed a higher accuracy of 1-day IFOBT compared to guaiac testing (Zappa et al, 2001). On the whole, in respect to policies adopted in other European countries (UK Colorectal Cancer Screening Pilot Group, 2004), the Italian strategy (biennial one-time IFOBT at X100 ng ml À1 ) favoured a sensitive approach, at the cost of reduced specificity.
Italian results have been confirmed by a recent study of van Rossum et al. (2008), which compared Hemoccult and OC-Hemodia in a randomised trial. Immunochemical faecal occult blood test showed a higher DR for cancer and advanced adenomas than guaiac test.
Comparing the performance of our reference strategy with the results of Dutch study, it can be noted that PR was substantially similar in the two settings, whereas our detection rates for cancer and advanced adenoma were lower than those found in the Dutch study (24 vs 18.4%). Likewise, the PPV for significant neoplasia was higher in the van Rossum study compared to those registered Table 5 Positivity rate (PR), number of screen detected cancers and advanced adenomas, number of colonoscopies, detection rate (DR) of cancers and advanced adenomas (per 1000 screened subjects) and number needed to scope (NTS) a to find a cancer or a significant neoplasia b by screening strategy Differences (%) of number of cancers, advanced adenomas and colonoscopies with reference strategy are given in brackets. a NTS: number of FOBT+ colonoscopies needed to find a cancer or a significant neoplasia. b Significant neoplasia: cancer+advanced adenomas (any adenoma larger than 9 mm, and/or with a villous histological component higher than 20%, and/or with severe dysplasia). c Reference strategy: 1-day and cutoff X100 ng ml À1 .
in our study (51.8 vs 45.8%). However, comparison between the two studies should be taken with caution, as difference in detection rates could also relate to other confounding variables, such as a higher prevalence of lesions in Dutch population (Ferlay et al, 2007) compared to Italian population. The reference strategy results in our study are consistent with those reported by the survey of Italian screening programmes (Zorzi et al, 2008) as far as diagnostic indicators (PR, DR and PPV) and compliance to assessment are concerned. This means that the population enrolled in this study is representative of the general population complying with the screening invitation, thus suggesting a safe generalisability of results.
One of the most important factors to be taken into account when choosing a screening strategy is the acceptability of the test. Several studies (Federici et al, 2005;van Rossum et al, 2008) showed a higher participation rate of invited subjects to 1-day IFOBT compared to 3-day guaiac screening. Several reasons may account for such a preference, such as the absence of dietary restrictions, a better hygiene in handling faecal samples and no need for multiple sampling.
In this study, no remarkable differences in participation rate were observed in every screening programmes involved in the study compared to that observed in resident subjects in the same areas, invited to the routine 1-day test screening.
Another important aspect of screening test acceptability is the easiness of sampling execution. Theoretically, multiple samplings should be more difficult to obtain and more unpleasant to handle, as subjects must store the first samples waiting the final one. In this study, only 22 of 20 596 recruited subjects did not return a second sample; this, together with the low rate of not acceptance to enter the study and the observation that attendance was similar to that registered by local service screening, using one-day sampling, suggests that the 2-day testing probably is not a major barrier to compliance with test returning.
It is well known that it takes many years for an advanced adenoma to progress (if at all) to CRC, and such a time window allows for cumulative sensitivity to be achieved by repeat screening. Therefore, a programme with a high compliance at repeat screening might favour a screening test with a lower oneshot sensitivity. On the contrary, when low compliance is expected at repeat screening, choosing a strategy with a high DR per single screening episode might be crucial.
A second relevant factor to be taken into account is the colonoscopy workload associated with a given strategy. Minimising PR is recommendable, as colonoscopy is an unpleasant and potentially harmful procedure, and accounts for a relevant part (about 50%) of total screening costs (Castiglione et al, 1997). The effect of different strategies on colonoscopy workload is quite variable; compared to the reference strategy, the number of required colonoscopies may vary from À54 (more specific) to þ 77% (more sensitive), whereas that of detected cancer may vary from À23 to þ 21%. Identifying the optimal strategy was not the aim of this study; however, although based on a limited sample size, our data suggest that minimising PR (by adopting the most specific strategy) is associated to a substantial loss in the DR, whereas, on the other hand, maximising the DR (by adopting the most sensitive strategy) is associated to hardly acceptable recall rate. Our study shows that different strategies modulate a continuum of opposite changes in sensitivity and specificity. Deciding the optimal cutoff, particularly in the absence of evidence of the efficacy of different strategies, is very difficult and may depend more on considerations of local resources (e.g., colonoscopy facilities) than purely on expected accuracy.
Colonoscopies needed to detect one cancer (i.e., the inverse of PPV) range from 21.2 with the less specific to 8.6 with the most specific strategy, whereas corresponding figures to detect one significant neoplasia (cancer þ advanced adenoma) are between 2.7 and 1.5, respectively (see Table 5). In this study, recruited subjects were in their first screening examination. The PPV is expected to decrease in repeat screening, due to a lower prevalence of disease; such a decrease may not have an impact on the differences between different strategies, but a recent study by Van Rossum et al (2008) suggests to increase the positivity threshold when resources are limited and CRC prevalence is presumably low.
As far as sensitivity is concerned, it is worth noting that different strategies seem to have a greater impact on advanced adenomas rather than on cancer DR. In fact, the more sensitive strategy (2-day sampling, at least one sample X80 ng ml À1 ) would obtain a 54% higher cancer DR (3.4 vs 2.1%) compared to a less sensitive strategy (2-day sampling with both positive results at X120 ng ml À1 ), whereas the DR for advanced adenomas would be more than double ( þ 120%, 22.6 vs 10.0%). The study of Guittet et al (2007) also supports the hypothesis that lowering IFOBT threshold will increase the DR of advanced adenomas more than the DR of cancer.

CONCLUSIONS
In conclusion, none of the screening strategies analysed in our study showed a clear-cut superiority of results, but we observed a continuum of results for all performance indicators while passing from more specific to more sensitive strategies. The findings of this study do not provide an ultimate answer to which is the best CRC screening strategy, but they may provide an helpful guidance when deciding on the implementation screening protocols, and when considering how to adapt a screening strategy to the underlying context, based on CRC epidemiology, observed (or expected) programme performance with respect to compliance to invitation, DR and PPV and the available resources.