Promise and pitfalls in the meta-analysis of genetic association studies: a response to Sen and Schinka

Article metrics

SIR—The two responses1, 2 to our recent meta-analysis3 offer an opportunity to revisit our analysis and review critically the meta-analysis methodology we employed. In addition, they illustrate the growing interest in meta-analytic techniques, both as a means by which the disparate literature regarding candidate gene studies of psychiatric and behavioural phenotypes may be reconciled to some degree, and as a means by which the perennial problem of lack of statistical power in such research can be addressed.4

What is apparent from our meta-analysis and the subsequent correspondence by Sen et al1 and Schinka2 is that different approaches to statistical analysis can, in meta-analysis as well as primary research, afford somewhat different results. One issue lies in the conversion of continuous scales to standard units. Where it is not possible to convert continuous scales to standard units, the most commonly used (and preferred) approach to combining studies is to calculate standardised mean differences.5 Within this class of approaches, the most popular methods are Cohen's d, Hedges' adjusted g, or Glass's Δ.5 A recent review of definitions, assumptions and comparisons is available.5 In revisiting our analysis, we applied Cohen's d, pooling using a fixed effects model, due to a lack of significant between-study heterogeneity, and found results similar to those reported by Schinka.2 These results are presented in Table 1, and show a strong dominant effect of 5-HTT LPR on NEO neuroticism, and a more modest but nevertheless significant recessive effect on TCI/TPQ harm avoidance.

Table 1 Standardised mean differences

Sen et al1 adopted a different approach and chose to standardise the means and standard deviations for each study, and not standardise the difference. We attempted to replicate this approach in our recent meta-analysis,3 and our standardised scores are consistent with Sen et al1. In our revisited analysis, using this approach of standardised scores, we find evidence of a dominant effect of 5-HTT LPR on NEO neuroticism (P<0.01), and a recessive effect on TCI/TPQ harm avoidance (P<0.01), which is consistent with our analysis of standardised mean differences (Table 1). However, standardisation to an overall mean and standard deviation may be problematic for at least two reasons. One is that there are a number of different ways of achieving that goal, and another that the prespecified mean and standard deviation parameters must be justified scientifically, and changes in specification may have an effect on the final results. There is scope for methodological work in this area, and we suggest that the standardised mean differences should be presented in most settings.

In our previous meta-analysis,3 we also applied (weighted) linear mixed model regression machinery to increase the efficiency and, possibly, the power of our analysis, as well as to adjust for confounding covariates. The means (and estimates of precision) presented3 were estimated directly from the model, but the calculations of Sen et al1 are more standard. Although, adjusting for covariates in regression models may highlight possible sources of heterogeneity, the results from adjusting for covariates and subgroup analyses in a meta-regression setting must be viewed with caution.6 A large number of studies are required, particularly if analysing standardised mean differences, as the stability of regression coefficients is problematic, where there are less than five observations per covariate in the model.7

Our re-evaluation has therefore highlighted a number of important methodological and reporting issues. First, while the theory of pooling data to drive precise estimates of effect size is fairly sound, it presumes that differences among the various studies are primarily due to chance, which need not be true. Studies are performed on different populations, and different subpopulations representing various characteristics. Genetic associations may operate differently in different subpopulations, and individual studies may use different inclusion and exclusion criteria, and measure outcome differently (eg, by employing different personality questionnaire instruments). The use of meta-analysis in cohort studies is therefore prone to bias, particularly because adjusting for confounding or moderating variables is difficult.8 Specific properties related to the genetic nature of the data, such as Hardy–Weinberg equilibrium, must be considered. Also, measurement error in individual scoring systems when investigating psychiatric and behavioural phenotypes should be considered, and if possible, sensitivity analyses should be performed. Many authors feel that the unthinking pooling of data can generate misleading results by ignoring meaningful heterogeneity among studies, and introducing further biases through the process of selecting trials.9, 10 It is now widely felt that meta-analyses should attempt to evaluate heterogeneity rather than just drown all differences by pooling data.11

Second, a meta-analysis can only be as good as the individual studies that contribute to it. In general, many genetic association studies are underpowered,12 particularly when looking for small genetic effects, which are likely in the case of psychiatric and behavioural phenotypes. Also, there are some differences between the frequency counts in our meta-analysis and those reported in the subsequent commentaries of Sen et al1 and Schinka,2 which highlights potential reporting problems. Important information is sometimes not given (eg, deviation from Hardy–Weinberg equilibrium13), and this may be problematic if this information relates to assumptions underlying the meta-analysis or the quality of the data. For example, the symmetric distribution assumption underlying the analysis of continuous outcomes may be difficult to check without the reporting of medians or specific skewness tests in the text.

Third, it is well known that studies with statistically significant outcomes are more likely to be published than nonsignificant studies.9 Also, small studies are less likely to be published as compared to large studies.9 It has also been shown that negative studies take longer to appear in print.9 Another problem is one of covert duplicate publication, where the same study data is published more than once.9 We found evidence of both publication bias and covert duplicate publication in our original meta-analysis.14 It has also been documented that papers that appear in languages other than English are more likely to be excluded from meta-analyses.9 All these biases are important and can invalidate the results of meta-analyses and, unfortunately, formal tests of publication bias are relatively weak tests.

Meta-analyses are therefore by no means perfect, for the reasons described above. Very large, well-designed primary studies remain the most reliable way of obtaining reproducible results. One of the reasons for the recent scepticism regarding the value of meta-analyses is the discrepancy between the results of meta-analyses and subsequent large studies, particularly in the context of clinical trials. For instance, in a meta-analysis,15 it was shown that magnesium given intravenously for acute myocardial infarction decreases mortality in hospital, but a subsequent large scale trial, ISIS-416 in 1995, did not show any such effect. Similarly, aspirin was shown to prevent pre-eclampsia in a meta-analysis,17 but a subsequent large-scale trial, CLASP18 in 1994, did not show such a protective effect. Several authors have attempted to quantify the discrepancies between meta-analyses and subsequent large studies, and have found that approximately 80% directionally agreed with the result from larger trials.19 More recently, a comparison of 12 randomised controlled trials with 19 previous meta-analyses20 found that the results of the meta-analyses would have led to the adoption of an ineffective treatment in 32% of cases, and the rejection of a useful treatment in 33%.There have been attempts to explain these discrepancies using various models.10, 11, 21

Given the complexities associated with meta-analysis, the data contributed and methodology applied should be both transparent and standard, so that they are not only reproducible, but can be reviewed with a critical eye. Recommendations regarding the minimum information that should be reported in a meta-analysis have been made.8 What is of central importance is to appreciate that the conclusions of a meta-analysis are by no means definitive, and the apparent authority that these methods lend to consequent results is potentially dangerous. There are well-known cases of the conclusions of apparently adequate and comprehensive meta-analyses being subsequently shown to be false following the completion of large, definitive studies. The findings of Sen et al,1 Schinka,2 and Munafò et al,3 indicate there is a degree of uncertainty regarding the effect of 5-HTT LPR genotype on anxiety-related traits using the scoring systems considered, and that measurement instrument may moderate this association. A large study with sufficient power and employing multiple phenotype measures would be a definitive means of resolving this complex and intriguing issue.


  1. 1

    Sen J et al. Mol Psychiatry, in press.

  2. 2

    Schinka JA . Mol Psychiatry, in press.

  3. 3

    Munafò MR et al. Mol Psychiatry 2005; 10: 415–419.

  4. 4

    Munafò MR, Flint J . Trends Genet 2004; 20: 439–444.

  5. 5

    Deeks JJ et al. In: Egger M, Davey Smith G, Altman D (eds). Systematic Reviews in Health Care: Meta-Analysis in Context, (2nd ed). BMJ Books: London, 2001 pp 285–312.

  6. 6

    Davey-Smith G, Egger M . In: Egger M, Davey Smith G, Altman D (eds). Systematic Reviews in Health Care: Meta-Analysis in Context, (2nd ed). BMJ Books: London, 2001, pp. 143–156.

  7. 7

    Peduzzi PN et al. J Clin Epidemiol 1995; 48: 1503–1510.

  8. 8

    Stroup DF et al. J Am Med Assoc 2000; 283: 2008–2015.

  9. 9

    Naylor DC . BMJ 1997; 315: 617–619.

  10. 10

    Ioannidis JP, Cappelleri JC, Lau J . J Am Med Assoc 1998; 279: 1089–1093.

  11. 11

    Lau J et al. Lancet 1998; 351: 123–127.

  12. 12

    Munafò MR et al. Trends Genet 2005; 21: 269–271.

  13. 13

    Salanti G et al. Eur J Hum Genet 2005. April 13 [Epub ahead of print].

  14. 14

    Munafo MR et al. Mol Psychiatry 2003; 8: 471–484.

  15. 15

    Teo KK, Yusuf S . Drugs 1993; 46: 347–359.

  16. 16

    ISIS-4 Collaborative Group. Lancet 1995; 345: 669–687.

  17. 17

    Imperiale TF, Petrulis AS . J Am Med Assoc 1991; 266: 260–264.

  18. 18

    CLASP Collaborative Group. Lancet 1994; 343: 619–629.

  19. 19

    Villar J et al. Lancet 1995; 345: 772–776.

  20. 20

    LeLorier J et al. N Engl J Med 1997; 337: 536–542.

  21. 21

    Egger M, Davey-Smith G . BMJ 1995; 310: 752–754.

Download references

Author information

Correspondence to M R Munafò.

Rights and permissions

Reprints and Permissions

About this article

Further reading