Main

During the past decade, there has been mounting interest in the possibility of instituting screening programmes for oral cancer and precancer. Although a disease of relatively low incidence, oral cancer has high morbidity and mortality and appears to fulfil many of the criteria of a disease suitable for screening population groups at risk.1,2,3

Nevertheless, it would be difficult to envisage the formal acceptance of population screening for oral cancer as a part of health policy without its likely costs and benefits, as well as its feasibility and suitability, having been evaluated. The best evidence of the efficacy of a clinical intervention is provided by a randomised controlled trial (RCT), this being the 'gold standard' for evaluations of effectiveness. However, the cost and logistical difficulties of organising and managing a trial of screening for such a relatively uncommon disease would be formidable. A feasible alternative in such circumstances is the use of simulation modelling. This technique synthesises and analyses data collected from multiple sources, including the literature, and is capable of generating valid cost-effectiveness data. Simulation modelling has the added advantage that by the means of sensitivity analysis, it enables a range of screening scenarios to be examined relatively cheaply and the optimum approach identified. However RCT would allow at the most only a very few programmes to be evaluated and compared simultaneously and would be extremely costly to mount.

The use of a simple model to simulate opportunistic screening of patients at risk of oral cancer in general dental practice and provide a tentative determination of the health gain screening might achieve, together with its cost-effectiveness, has been reported previously.1,2,3 However, the information on the validity of the clinical screening method used in these studies was from very limited sources and the economic data in particular were crude. Therefore, as a preliminary step towards collecting a comprehensive data set on all aspects of oral cancer and precancer screening with a view to constructing a more refined model capable of simulating and evaluating a variety of screening scenarios, it was decided to conduct a comprehensive literature search to assemble a range of reported values for various aspects of screening performance. These would include sensitivity, specificity and other expressions of the validity of clinical oral examination as a screening test, the uptake of screening programmes among target populations and compliance in attendance for follow up diagnostic examination in screen-detected cases. It was also determined that where appropriate, meta-analysis of the data would be undertaken.

It was decided that the collection of data on the cost of screening and follow-up diagnostic work, and on the cost of treatment and rehabilitation of oral cancer patients, should be the subject of a separate investigation.

Method

The review was designed to be of low recall and high precision and the remit was limited to a consideration of studies that would yield quantitative information on specific parameters of oral cancer and precancer screening performance. Although not a systematic review as such, it was conducted as far as practicable in accordance with accepted guidelines for such studies.

Inclusion and exclusion criteria

In order to keep the review within manageable bounds and focused on the essential information sought, only research reports within the area of oral cancer and precancer were admitted. Moreover these had to relate specifically to screening, or to screening programmes, and assess the validity of the screening activity using at least a 'soft' gold standard. Ideally this would comprise a comprehensive diagnostic work up of all subjects screened positive, but failing that, detailed clinical examination by an oral medicine or surgery specialist would suffice. In order to provide estimates of true- and false-negative rates it was determined that studies accorded full weight should also include the same diagnostic information on at least a proportion of those screened negative. However, studies from which it was possible to derive only true- and false-positive rates, among those screened and referred to secondary care facilities for further investigation, were also deemed eligible for inclusion. These were collated and examined separately. The inclusion and exclusion criteria are summarised in Table 1.

Table 1 Table 1

Added value would be gained where study reports provided useful additional information. This would include the type of personnel involved in screening (and details of their training and examiner reproducibility where applicable); a full list of the target lesions classified as positive and those regarded as negative; effect modifiers if specified such as age, gender, tobacco usage and drinking habits of the target group; and whether programmes were invitational or opportunistic. The inclusion of multiple studies based on the same data was avoided. A full list of studies both included in, and excluded from the review was maintained.

Search strategy

A literature search was conducted using the databases PubMed, SCISEARCH and the Cochrane Library. Subject headings specified were 'oral neoplasms', 'oral cancer', 'oral precancer' 'screening' and 'screening programmes'. This yielded a collection of 60 articles of prima facie interest in peer-reviewed journals published up until the end of December 2000. Abstracts displayed indicated that 27 of these papers might be worthy of detailed scrutiny. A further three papers of possible interest, not cited in the databases, were disclosed from hand searching of journals, or through personal contact with other investigators working in the field.

Data synthesis

Global estimates for sensitivity and specificity were obtained from the selected studies using the summary receiver operator characteristic (SROC) curve meta-analytical technique described by Irwig et al.4 The first step of this technique was to utilise the odds ratios from the studies as a measure of the discriminatory ability of the diagnostic procedures. These odds ratios were combined using a standard random-effects meta-analysis within the STATA software package.5 The meta-analysis results were checked for the possibility of skew according to the methods described by Moses and co-workers by use of a weighted least squares regression analysis.6,7 A SROC curve was constructed from the pooled odds ratio (with 95% confidence intervals) by calculating the values of specificity for every possible value of sensitivity. Finally a weighted 'pooled' value for sensitivity was obtained from the studies and the corresponding specificity value (with 95% confidence intervals), for this level of sensitivity, was calculated from the equation of the SROC curve. This value of specificity was utilised to calculate the corresponding point estimate and 95% confidence interval for sensitivity.

Results

Six studies captured in the literature search, describing seven substantive or pilot screening programmes, yielded data on the sensitivity and specificity of the specified screening procedure (systematic visual examination of the oral mucosa) or provided sufficient information for these values to be derived.8,9,10,11,12,13,14 The studies and the basic information they yielded are listed in Table 2. Two of these, and the further reports listed in Table 3, provided outcome data on patients referred for secondary care including numbers of individuals screened, numbers referred, numbers who attended, true- and false-positive rates, and positive predictive values of the test. 9,12,15,16,17,18,19,20,21

Table 2 Table 2
Table 3 Table 3

With regard to the findings of the first group of studies (Table 2), sensitivity (Sn) values ranged from 0.60 to 0.95. Specificity (Sp) values were at least 0.94 apart from the Sri Lanka study where the basic health workers returned a false-positive rate of 19% (Sp = 0.81).13 In the second group of studies (Table 3), positive predictive values (PPV) among screened individuals attending secondary care facilities, fluctuated widely between 0.45 12,15 and 1.00 in the study of Field et al, where prevalence was evidently extremely low and only four cases with precancerous lesions were referred and subsequently attended.16

Meta-analysis

Figure 1 shows the ROC curve for the seven studies in Table 2 which were included in the meta-analysis.Figure 2 and Table 4 show the results of the random effects meta-analysis of the discriminatory ability of the screening studies. The test of skew gave a coefficient for the effect of skew on discrimination of 1.64 (95% CI -4.01, 7.30) with a P-value of 0.49, indicating no evidence to suggest that the SROC curve was asymmetrical. The SROC curve for these studies is shown in Figure 3The weighted pooled value of their sensitivity was 0.796. From the equation for the SROC curve, the corresponding value of specificity at this level of sensitivity was 0.977 (95% CI 0.941, 0.991). When specificity was held at 0.977, the corresponding value of sensitivity from the SROC curve was 0.796 (95% CI 0.594, 0.912).

Figure 1
figure 1

ROC curve for seven oral cancer screening studies/figtl>

Figure 2
figure 2

Random effects meta-analysis of discriminatory ability of seven oral cancer screening studies/figtl>

Table 4 Table 4
Figure 3
figure 3

Summary ROC curve for seven studies/figtl>

Discussion

The studies listed in Tables 2 and 3 which met the inclusion criteria (Table 1) showed considerable heterogeneity with respect to the country and specific location where they were conducted; the type of personnel carrying out the screening and their training and calibration; the demographic characteristics of the target population; the numbers screened; the types of lesions categorised as 'positive' (target lesions); the prevalence of target lesions; the referral centres and type of personnel representing the gold standard examiners; and whether the programmes were opportunistic or invitational. For example, with regard to the British studies in the first group (Table 1), that of Downer et al8 conducted in a company headquarters, and the general practice (GP) component of the study of Jullien et al,14 were invitational, as was the study of Ikeda et al, in part of the Nagoya conurbation in Japan.9 The hospital component of the study of Jullien et al, however, was opportunistic.10 Mathew et al provided an interim report of a controlled trial of screening, randomised by administrative district, in Kerala, India where there is a high oral cancer incidence.11 Mehta et al described a house-to-house case finding exercise, also in Kerala.12 The reports of Warnakulasuriya and colleagues in Sri Lanka described a similar approach.13,20,21 The British and Japanese studies employed dentists from various practice settings as screeners, without specific training in the first instance and with training and calibration in the second. On the other hand, the studies in the Indian sub-continent, reflecting the economic circumstances prevailing in that part of the world, investigated the use of specifically trained basic health workers as screeners in a possible relatively low cost preventive strategy for combating their exceptionally high levels of the disease. In practical terms, the clear heterogeneity among the studies appeared to have only a moderate influence on the sensitivity and specificity values reported. The discriminatory ability demonstrated by the screening personnel, irrespective of their grade and training, was generally of a fairly high order and did not appear to be greatly affected by the varied circumstances of the studies.

It is rare that the results of screening tests are combined using formal methods such as meta-analysis. It is even less common to see such analyses reported within the dental literature. There are a variety of inappropriate methods that have been employed in attempts to synthesise the results from multiple screening studies. It therefore seems worthwhile to discuss some of the methodological issues that are relevant to the synthesis of data relating to diagnostic tests.

Studies evaluating screening programmes may differ in their thresholds for calling a test result positive, in the present instance possibly as a result of variation in the target lesions specified or systematic variation resulting from training of the screeners, or lack of it. Since the sensitivity and specificity of a screening test are inter-related, changes to the threshold for a positive diagnosis will affect both measures. Thus if investigators consider it essential to detect as many positive cases as possible, the sensitivity of a screening test could theoretically be improved, but only at the cost of reducing specificity. This variation in positive threshold can be usefully summarised with a receiver operator characteristic (ROC) curve. This is achieved by plotting the sensitivity of a test against 1-specificity.

A method for combining the results of several studies must account for both the discriminatory ability of each study and the variation in diagnostic threshold. It is inappropriate to directly pool the results of each investigation since the differing prevalence of positive lesions across studies acts as a confounding factor when diagnostic thresholds differ. This is particularly likely to be a problem if there is a wide range of prevalence of disease across different studies (eg if trying to combine studies from different populations such as the United Kingdom and India). It is also inappropriate to calculate sensitivity and specificity separately within each study and to attempt to derive a weighted average of each measure (eg with weights being based on study size). This avoids the problem of confounding, but may still lead to an underestimate of the true accuracy if there is variation in the diagnostic thresholds used by different investigators.4

Since sensitivity and specificity are inter-related and since both diagnostic thresholds and disease prevalence may vary, the correct approach to combining the results of several studies is to use meta-analysis to produce a summary receiver operator characteristic curve (SROC) as previously described. If the odds ratio is constant across different thresholds this will lead to a symmetric SROC curve. Thus, if positive and negative screening results change the odds equally, then the diagnostic threshold is not skewed (biased) toward either diagnosis. To gain a global estimate of sensitivity and specificity it is necessary to derive a weighted estimate for one measure and then calculate the corresponding value (and confidence interval) for the other measure using the results of the meta-analysis.

The meta-analysis described is considered to have yielded an appropriate range of values, for the particular variables examined, for input to a simulation model. With the addition of other necessary data and the use of sensitivity analysis, this could then be used to ascertain the likely cost-utility of different oral cancer and precancer screening scenarios. This will be the subject of further research.

Amendment We would like to inform readers that an incorrect dosage was printed in a recent research paper. The dosage in the discussion of the paper in British Dental Journal 2002; 192: 163, was wrongly given as 1:18000 and should have been 1:80000.