|
|||||||||||||||||||||
| PRACTICE |
JANUARY 25 2003, VOLUME 194, NO. 2, PAGES 73-78 Table of contents PDF < Previous Article Next > |
||||||||||||||||||||
Further statistics in dentistry Part 8: Systematic reviews and meta-analysesA. Petrie,1 J. S. Bulman2 and J. F. Osborn3 Correspondence to: Aviva Petrie, Senior Lecturer in Statistics, Biostatistics Unit, Eastman Dental Institute for Oral Health Care Sciences, University College London, 256 Gray's Inn Road, London WC1X 8LD A systematic review of research evidence is an efficient approach to integrating existing information, invariably a multiplicity of published articles, with a view to establishing whether the scientific findings are consistent. If so, it may be possible to draw conclusions and make recommendations about treatment regimens or observed effects which have greater credence than those obtained from individual studies. The systematic review relies on a specified checklist which determines which articles should be included in the review, and how each should be critically appraised to provide relevant information relating to the focus of the review.
SYSTEMATIC REVIEWS The report of a systematic review is somewhat like that of a research paper; it contains a clear description of the aims, and the material and methods used by the reviewer. The alternative haphazard non-systematic review has no defined rules concerning the process of digesting the mass of information, and is open to abuse. A systematic review serves various purposes:
The Cochrane Collaboration (www.update-software.com/ccweb/cochrane/general.htm) is an international network of individuals and institutions which prepares systematic reviews of randomised controlled studies and of observational evidence. It helps to promote the development of systematic reviews by setting explicit standards for systematic reviews. It provides a framework within which scientists of like interests can collaborate, and through its publication, the Cochrane Database of Systematic Reviews, allows electronic access to the latest detailed and highly structured reports on subjects of interest. META-ANALYSIS
A special form of systematic review is a meta-analysis (sometimes called an overview); this is a statistical approach to combining the results from separate but similar studies to provide an overall quantitative summary of the effect of interest. A meta-analysis is thus a statistical analysis of a collection of statistical analyses from individual studies. Full details of the theory of meta-analysis may be obtained in Hedges and Olkin (1985).1 In addition, a paper by Song et al. (1997)2 provides a useful discussion of how to handle discrepancies in recommendations arising from different meta-analyses of what appear to be the same research question. In principle, a meta-analysis proffers the advantages of increased power, and increased precision of its estimates, when compared with a single study. In practice, the meta-analysis is open to criticism, essentially on four grounds (Glass et al., 1981):3
EXAMPLE Meta-analyses in dentistry are not very common. The example used in this paper, a meta-analysis by van Rijkom et al. (1998),6 can be criticised but is, nevertheless, thorough and accessible. The authors used a meta-analysis to estimate the overall caries inhibiting effect of fluoride gels applied to the permanent teeth of children aged 6 to 15 years. Each of the 19 studies included in the analysis, referenced by study number at the end of this paper, was obtained from a MEDLINE search of the published literature of English and German studies. All these studies satisfied various selection inclusion and exclusion criteria, and their follow-up periods were between 1.5 and 3 years (median 3 years). In particular, each of the chosen studies was a randomised controlled trial in which the effect of the fluoride gel treatment was compared with no treatment or placebo treatment. In fact, some of the 19 studies were independent substudies of a larger study which had been split into two to reflect differences in general fluoride regimen. The inhibiting effect of the treatment was expressed for each study by the prevented fraction (PF); this was calculated as the difference in the incidence between the decayed, missing and filled surfaces (DMFS) in the control group (lc) and the incidence in the experimental group (le), divided by that in the control group [ie PF = (lc - le)/lc]. The absolute difference between the incidences in the two groups was standardized (ie divided by lc) since the PF was assumed to be less sensitive to experimental circumstances, such as the age range of the study population and the duration of the study, than (lc - le). THE EFFECT OF INTEREST IN A META-ANALYSIS Explaining the effect of interest There are two approaches to combining the information in a meta-analysis. The parametric approach is usually adopted; this assumes that the effect of interest in each study is Normally distributed. Note that both the difference between the means and its standardized difference are Normally distributed for Normally distributed data; similarly, the logarithm of the relative risk, equal to the difference in the logarithms of the two risks, is approximately Normally distributed. The PF in this example is assumed to be approximated Normally distributed. The parametric approach focuses on combining the results from the k studies, estimating the overall effect of interest, with its confidence interval, testing its significance and interpreting these results. Occasionally, a non-parametric approach is used which makes no distributional assumptions about the effect of interest. However, the non-parametric methods often require the raw data from each study, which can limit their use, and they are not described here. Displaying the effect of interest The most usual pictorial representation (Fig. 1) is sometimes called a 'forest plot'. It shows the estimated effect of interest (in the fluoride gel example it is the PF but might, in other circumstances, be the standardized difference in means or the odds ratio) for each of the separate studies in the meta-analysis. The confidence intervals for the true effect in each case, as well as the overall estimated effect (and related confidence interval) from the pooled data from all the studies, are also indicated. An explanation of the method used to calculate the overall estimated effect is given in the section entitled 'calculating the effect of interest'. A vertical line, known as the 'line of no effect', is sometimes drawn in the diagram. It represents equal effectiveness of the treatments (for example, it would correspond to a value of zero for PF or a difference in means, or unity if the effect of interest were the odds ratio). In the fluoride gel example, only five of the confidence intervals for the true PFs cross the line of no effect, whereas twelve of the confidence intervals are to the right of it; this suggests that fluoride gel is an effective inhibitor of caries. It is possible to get some idea of whether the estimates of the effects from the different studies are compatible by 'eye-balling' the forest plot. If the confidence intervals of the effects overlap, then the trials are likely to be compatible, whereas if there is no overlap, then they are incompatible. The confidence intervals in Fig. 1 show considerable overlap, suggesting that the results of the different studies are likely to be compatible. However, it should be noted that the estimated PFs show substantial variation, so this conclusion should be viewed with caution. Checking for compatibility between the trials
A more formal approach to determining incompatibility is to perform a statistical test, described in detail in the section entitled 'Testing for homogeneity'. If, on the basis of the test result, the observed effects are more disperse than would be expected on the basis of chance alone, statistical heterogeneity is said to be present, ie the estimated effects exhibit considerable variation and are incompatible. Statistical heterogeneity may be caused by clinical heterogeneity, methodological differences or it may be related to unknown trial characteristics. The presence of statistical heterogeneity is indicated if the test of statistical homogeneity (homogeneity implies that the effects are equal) is significant. If the test is not significant, this does not imply that there must be statistical homogeneity. A non-significant result implies only that there is no evidence to reject the null hypothesis of homogeneity, and not that there is evidence to accept it. It should be pointed out that the test of statistical homogeneity has low power and, therefore, may fail to produce a statistically significant result unless there is marked heterogeneity. Whether or not the test is significant, it is important to provide an estimate of the extent to which there is statistical heterogeneity. Then, if this estimate indicates that there might be substantial statistical heterogeneity, the aspects of clinical heterogeneity which may be causing it should be investigated. Calculating the effect of interest
where
Note that for both the fixed-effects and random-effects approaches, an approximate 95% confidence interval for the overall estimate of effect, θ , is given by The authors in the fluoride gel example used two approaches to investigate heterogeneity. Firstly, they believed that the large overlap in the confidence intervals in the forest plot was an indication that there was no evidence of statistical heterogeneity. However, although there was considerable overlap in the confidence intervals, the estimates from the different studies showed substantial variation. Secondly, they used a multiple regression analysis to determine whether there were factors influencing the caries-inhibiting effect of fluoride gel application. This analysis showed no significant influence of the covariables that were thought could be relevant, namely, 'application frequency', 'application methods' (tray/brush), 'baseline caries prevalence' and 'general fluoride regimen'. Thus, they concluded that all studies could be regarded as equally effective, and the overall effect could be estimated using a fixed effects model. The weight for each study was chosen to be the inverse of the variance of the prevented fraction, PF; this gave an estimated overall PF of 22%. The 95% confidence interval for the true PF was 18% to 25%, the shaded area in Fig. 1; this excludes zero and suggests that fluoride gel was an effective inhibitor of caries in children of this age. HYPOTHESIS TESTS IN META-ANALYSIS There are two hypothesis tests that are of crucial importance in a meta-analysis, one which tests for homogeneity of the effects of interest and the other which tests the significance of the overall treatment effect. Testing for homogeneity
which is assumed to follow a chi-squared distribution with (k-1) degrees of freedom. As homogeneity is assumed under the null hypothesis in this test, the fixed-effects and the random-effects approaches are not distinguished, and the weight, wi, is the same as that used in the fixed-effects approach, i.e. it is the reciprocal of the variance of the effect in the ith study (i = 1, 2, 3, . . ., k), where that variance is a measure only of the random variation within the study. It is interesting to note that, even though the confidence intervals for the PF's from the separate studies overlap (Fig. 1), the chi-squared test of homogeneity in the fluoride gel example gives a result which is marginally significant at the 5% level. This, together with the view that the different studies have estimated PFs which show considerable variation, should perhaps be an indication that combining the estimates of PF is questionable and that the overall estimate of the PF should be interpreted with caution. In addition to the overall test which investigates heterogeneity (in fact, it tests homogeneity), it is possible to test for funnel plot asymmetry which assesses bias. Details may be obtained from Eggar et al., (1997).7 A funnel plot (Fig. 2) is a scatter plot of the sample size against the treatment effect estimate generated in an individual study. The sample size can be replaced by the precision of the estimated effect. Since the precision of the estimated treatment effect increases as the sample size of the component study increases, the results from small studies would be expected to show a wide scatter at the bottom of the graph, with the spread decreasing (narrowing to produce a funnel effect) at the top of the graph for the larger studies. The funnel plot will often be skewed and asymmetrical if bias is present, as demonstrated in Fig. 2. The lower left corner of the funnel is somewhat empty (ie lacking publications), indicating that some studies on small sample sizes with small effects are probably missing. The effect of this publication bias on the overall PF, however, is likely to be marginal, because the weight of such unpublished low-power studies is small. It is possible to measure the degree of asymmetry in a funnel plot, but the approach is of limited value if only a few trials are included (remembering that the unit of analysis is the randomised trial and not its patients). Testing the treatment effect
which follows the chi-squared distribution with one degree of freedom. Each wi is the reciprocal of the variance of the estimated effect of the ith study. In the fixed-effects approach, this variance is a measure only of the random variation within the study; in the random-effects method, the variance comprises both the random variation within the study and the variation between the estimated effects in the k studies. The test for the overall PF in the fluoride gel example gives a highly significant result (P < 0.001) indicating that the overall effect, estimated by a PF of 22% (95% confidence interval equal to 18% to 25%), is significantly different from zero. This implies that the fluoride gel is an effective inhibitor of caries in children of this age. Note that, as observed previously, zero lies outside the 95% confidence interval for the overall PF, as expected if the test result is significant. Acknowledgement
Table 1 Summary of study results arranged according to application method and application frequency6 Fig. 1 Caries-inhibiting effect of fluoride gel treatment (PF with 95% confidence intervals) in 19 studies. The shaded area shows the 95% confidence interval for the pooled PF.6 Fig. 2 Funnel plot of the sample size for each study plotted against the estimated prevented fraction, PF (%) for the 19 studies included in the meta-analysis |
|||||||||||||||||||||
Refereed paper. |
|||||||||||||||||||||
| VOLUME 194, NO. 2, JANUARY 25 2003 | |||||||||||||||||||||