Abstract
The number of published genetic association studies (GASs) is increasing tremendously due to the availability of mapped single-nucleotide polymorphisms (SNPs) and advances in genotyping technologies. A search in HuGENet illustrates the rapid accumulation of evidence for major diseases. Recently, there has been a lot of activity regarding genome-wide association studies (GWASs), and a growing number of forthcoming studies is expected. GASs and GWASs are usually underpowered to detect significant associations, and the varying quality of reporting publications befuddles researchers. A meta-analysis can increase power and provide standards of reporting results. However, the conduct of a meta-analysis of GASs faces a major obstacle, which is the structure and diversity of stored information in databases. Similar problems are expected for GWASs, though the data are not yet publicly available. The development of a Web-based system for the detailed and structured recording of GAS or GWAS data, accompanied by an estimation of the overall genetic risk effects, would enable scientists to keep track of evidence for gene–disease associations.
Similar content being viewed by others
Introduction
Genetic association studies (GASs), or candidate–gene studies, assess the association between disease status and genetic variants (gene polymorphisms) in a population. They have been particularly popular for investigating complex diseases and the number of papers on GASs has increased tremendously. This trend is expected to accelerate because of the rapidly increasing availability of mapped single-nucleotide polymorphisms (SNPs) and advances in genotyping technologies (Donahue and Allen 2005). Dealing with all of the accumulated evidence, especially when studies frequently produce controversial or inconclusive results, is a major challenge. Meta-analysis provides a tool to estimate population-wide genetic risk effects for pertinent gene–disease associations, therefore, helping to resolve contradictory results and to decrease the uncertainty of the estimated risk. Since 2002, the molecular research of genotype–phenotype associations has entered the genome-wide era, as a consequence of the completion of the Human Genome and HapMap projects and the development of ultra-high-volume genotyping chip technologies. In contrast to GASs, the genome-wide association studies (GWASs) are hypothesis-free (i.e., unbiased) and involve a massive scan of the genome with a dense set of SNPs (up to 500,000) in a single experiment, in search of causal variants. Thus, GWASs represent a comprehensive option to GASs when there is a lack of evidence regarding the function or the location of the causal genes. However, the vast amount of data produced by GWASs represent a major methodological challenge both in their primary analysis and in their meta-analysis (Thomas and Witte 2002; Hirschhorn and Daly 2005; Zintzaras and Lau 2007).
Accumulation of evidence
A search in HuGENet (http://www.cdc.gov/genomics/hugenet/default.htm) conducted for the period January 2000 to December 2006 found 535, 887, 351, 694, and 249 GASs published in hypertension (Zintzaras et al. 2006a, 2006b), schizophrenia (Zintzaras 2006a, 2006b, 2007), Parkinson’s disease (Zintzaras and Hadjigeorgiou 2004, 2005), breast cancer (Zintzaras 2006a, 2006b), and alcoholism (Zintzaras et al. 2006a, 2006b), respectively (see Fig. 1). These five topics are among many other common complex diseases with a major impact in public health, and whose underlining pathogenetic mechanisms are not clearly understood. In breast cancer and Parkinson’s disease, the number of studies published each year after 2002 is consistently above 100 and 50 studies, respectively, and in schizophrenia, there is an upward trend, with an average growth rate of 75% per year. In hypertension and alcoholism, the annual number of published studies on average is 76 and 35, respectively. Therefore, the same or even a larger number of publications of GASs in these topics in the coming years can be expected, and the pattern of published GASs is likely to be similar for other topics.
A search in PubMed for GWASs (August 2007) has yielded 36 articles describing 42 studies for various multifactorial diseases, such as coronary artery disease, Crohn’s disease, Parkinson’s disease, age-related macular degeneration, and Alzheimer’s disease. Moreover, an increasing number of funded NIH and European initiatives have been undertaken in this direction during the last few years, leading to a growing number of forthcoming and promising studies (Thomas and Witte 2002).
Validity of genotype–phenotype associations
The published literature has demonstrated a plethora of questionable gene–disease associations, based on both the candidate gene and the genome-wide approach, the replication of which has often failed in independent studies (NCI-NHGRI Working Group on Replication in Association Studies 2007). Although replication is essential for establishing the credibility of a genotype–phenotype association, a number of methodological and design issues in the initial studies are hampering research.
Small sample size is a frequent problem and can result in insufficient power to detect minor contributing roles of one or more alleles. The most realistic genetic association between a polymorphic locus and a disease has been claimed to yield an odds ratio of between 1.1 and 1.5. Therefore, to achieve a satisfactory power (>80%) to identify a modest genetic effect (odds ratio 1.2) of a polymorphism present in 10% of individuals, a sample size of 10,000 subjects or more would be needed for a GAS. Likewise, for GWASs, testing for 500,000 SNP associations in a case–control study at a 5% significant level, a Bonferroni correction would require significance at p = 0.05/500,000 = 10−7. Then, to attain a 95% power for OR = 1.2 and minor allele frequency of 10%, 15,000 case–control pairs are required in a single study, leading to a significant financial burden (Thomas and Witte 2002; Wang et al. 2005). The meta-analysis of multiple studies clearly has a role in offering an analysis with the potential for higher probability to detect significant results (Munafò and Flint 2004).
Population stratification can be a confounding factor in gene–disease associations and arises when differences in the genetic structure of the underlying population are not taken into account (Zintzaras and Sakelaridis 2007). Then, the cases and controls are not matched for their genetic background, which can lead to biased or spurious results (Cardon and Palmer 2003). Moreover, the quality design issues of individual studies, such as the definition of a phenotype, validity of the genotyping method, and heterogeneity in exposure to environmental challenges, can increase the risk of biases (Zintzaras and Stefanidis 2005; Zintzaras et al. 2007). A meta-analysis provides a robust tool to investigate discrepant results, to decrease the uncertainty of the effect size of estimated risk, and to explore the heterogeneity between studies (Zintzaras and Ioannidis 2005; Zintzaras and Lau 2007).
The role of meta-analysis
The number of meta-analyses appearing in HuGENet is 9, 36, 10, 19, and 7, for hypertension, schizophrenia, Parkinson’s disease, breast cancer, and alcoholism, respectively. These meta-analyses can provide answers to the question of whether there is evidence of an association between gene polymorphism and disease. Table 1 shows the meta-analyses’ summary risk effects (odds ratios) for investigating associations between various gene polymorphisms and the five diseases of interest. The meta-analyses were based on all of the available studies at the time of performing the meta-analysis. In total, 96 gene polymorphisms were examined, and the number of studies included in the meta-analyses ranged from 2 to 48. Significant associations under any genetic contrast were found for four polymorphisms in hypertension, 26 in schizophrenia, four in Parkinson’s disease, 11 in breast cancer, and six in alcoholism. An association was considered to be significant when the p-value was less than 0.05 or the 95% confidence interval of the odds ratio did not include 1.0. So far (PubMed accessed August 2007), one meta-analysis in the field of GWASs for Parkinson’s disease has been published, synthesizing only three studies (Evangelou et al. 2007), but there is still a need for establishing a proper methodology for the meta-analysis of genome-wide rich data (Zintzaras and Ioannidis 2007).
The methodological issues relating to meta-analyses have been previously described in detail (Munafò and Flint 2004; Zintzaras and Lau 2007), and are beyond the scope of this paper. However, the benefit of getting an assessment of the overall risk effects from a meta-analysis is obvious. As evidence is accumulating rapidly, the updating of genetic risk effects can provide information on whether an association is real, or that more evidence is needed in order to draw reliable conclusions on the association (Neale and Sham 2004). However, a meta-analysis requires a large amount of labor and effort, since it involves systematic searches in databases, article retrieval, data extraction, data entry, and data analysis (Hirschhorn et al. 2002).
Obstacles in meta-analysis
The major obstacle in undertaking a meta-analysis of GASs is the structure and diversity of stored information in databases, such as the Genetic Association Database (http://geneticassociationdb.nih.gov/cgi-bin/index.cgi), PubMed (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed), EMBASE (http://www.embase.com/), HuGENet, and OMIM (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM). The information provided in the databases is not structured and standardized appropriately for meta-analysis use, and they are not comprehensive. These databases, though important for promoting research, do not claim to be and are not a substitute for the meta-analysis of GASs. Thus, gathering and meta-analyzing data related to an SNP association with a disease/disorder is not straightforward. The minimum information that the articles should provide for meta-analysis is the genotype frequencies, the ‘race’ of samples, the study design, the demographic characteristics, and possible effect modifiers or confounders. In addition, the literature is filled with alternative, idiosyncratic, and arbitral gene and SNP names and symbols, making cross-comparison and meta-analysis difficult [e.g., an SNP can be found with more than one arbitral name, designating amino acid or nucleotide substitutions or else, and rarely with the official dbSNP ID rs-number (http://www.ncbi.nlm.nih.gov/projects/SNP/)] (Kitsios and Zintzaras 2007; Zdoukopoulos and Zintzaras 2007). Much of the information that the biological researcher is interested in is available in public reference databases and in the millions of articles of the scientific research literature, mostly accessible via the Internet. It is estimated that about 80% of biological data are in text form, and even the abstracts are written in free text utilizing a complex biological vocabulary, which may vary significantly in different areas of research. Thus, despite their wide availability, these data are not generally machine-searchable. Consequently, no promising or significant increase in the efficiency of data integration may be expected, for example, from the automation of PubMed data retrieval (Teufel et al. 2006; Lacroix 2002; Colhoun 2003). Regarding GWASs, no publicly available database is currently storing the enormous amount of accumulating data.
Improving the quality of reporting genotype–phenotype association
The situation of GASs is similar to the problem clinical medicine faced more than 10 years ago, where the large number of randomized control trials of varying quality befuddled clinicians (Lau et al. 1992). Then, concerted efforts by the medical community were initiated to improve the quality of reporting research publications. For example, the CONSORT statement (Begg et al. 1996) was published with a view to improving the quality of the reporting of randomized controlled trials, the QUOROM statement (Moher et al. 1999) was focused on the reporting of meta-analyses, and the STARD initiative (Bossuyt et al. 2003) was developed to improve the reporting of studies of diagnostic accuracy. The Cochrane Collaboration was created in response to the need for collecting, synthesizing, and disseminating the effect of health care in prevention. This is a successful model that should be emulated by other scientific disciplines that have similar needs to synthesize evidence.
The need for data sharing has been highlighted by the Genetic Association Information Network (GAIN) initiative, in an effort to facilitate the subsequent and joint analysis of GWASs data (Thomas and Witte 2002). All genotypes will be made public as they are generated and checked for quality. However, the sheer scale of GWAS data will pose significant practical challenges regarding the form of the stored statistical results and their suitability for meta-analysis.
Conclusion
Given the rapid accumulation of evidence regarding gene–disease associations, a Web-based system for data storage and automated meta-analysis of genetic association studies (GASs) and genome-wide association studies (GWASs) results is essential to: (1) keep track of the evidence for gene–disease associations, (2) improve the quality of reporting, and (3) reduce effort and labor for performing meta-analysis.
References
Begg C, Cho M, Eastwood S, Horton R, Moher D, Olkin I, Pitkin R, Rennie D, Schulz KF, Simel D, Stroup DF (1996) Improving the quality of reporting of randomized controlled trials. The CONSORT statement. JAMA 276:637–639
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HC, Lijmer JG; Standards for Reporting of Diagnostic Accuracy (2003) The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 49:7–18
Cardon LR, Palmer LJ (2003) Population stratification and spurious allelic association. Lancet 361:598–604
Colhoun HM, McKeigue PM, Davey Smith G (2003) Problems of reporting genetic associations with complex outcomes. Lancet 361:865–872
Donahue MP, Allen AS (2005) Genetic association studies in cardiology. Am Heart J 149:964–970
Evangelou E, Maraganore DM, Ioannidis JP (2007) Meta-analysis in genome-wide association datasets: strategies and application in Parkinson disease. PLoS ONE 2:e196
Hirschhorn JN, Daly MJ (2005) Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6:95–108
Hirschhorn JN, Lohmueller K, Byrne K, Hirschhorn K (2002) A comprehensive review of genetic association studies. Genet Med 4:45–61
Kitsios G, Zintzaras E (2007) Genetic variation associated with ischemic heart failure: a HuGE review and meta-analysis. Am J Epidemiol 166:619–633
Lacroix Z (2002) Biological data integration: wrapping data and tools. IEEE Trans Inf Technol Biomed 6:123–128
Lau J, Antman EM, Jimenez-Silva J, Kupelnick B, Mosteller F, Chalmers TC (1992) Cumulative meta-analysis of therapeutic trials for myocardial infarction. N Engl J Med 327:248–254
Moher D, Cook DJ, Eastwood S, Olkin I, Rennie D, Stroup DF (1999) Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement. Quality of Reporting of Meta-analyses. Lancet 354:1896–1900
Munafò MR, Flint J (2004) Meta-analysis of genetic association studies. Trends Genet 20:439–444
NCI-NHGRI Working Group on Replication in Association Studies (2007) Replicating genotype–phenotype associations. Nature 447:655–660
Neale BM, Sham PC (2004) The future of association studies: gene-based analysis and replication. Am J Hum Genet 75:353–362
Teufel A, Krupp M, Weinmann A, Galle PR (2006) Current bioinformatics tools in genomic biomedical research. Int J Mol Med 17:967–973
Thomas DC, Witte JS (2002) Point: population stratification: a problem for case–control studies of candidate–gene associations? Cancer Epidemiol Biomarkers Prev 11:505–512
Wang WY, Barratt BJ, Clayton DG, Todd JA (2005) Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6:109–118
Zdoukopoulos N, Zintzaras E (2007) Genetic markers in placental abruption: a HuGE review and meta-analysis. Epidemiology (in press)
Zintzaras E (2006a) C677T and A1298C methylenetetrahydrofolate reductase gene polymorphisms in schizophrenia, bipolar disorder and depression: a meta-analysis of genetic association studies. Psychiatr Genet 16:105–115
Zintzaras E (2006b) Methylenetetrahydrofolate reductase gene and susceptibility to breast cancer: a meta-analysis. Clin Genet 69:327–336
Zintzaras E (2007) Brain-derived neurotrophic factor gene polymorphisms and schizophrenia: a meta-analysis. Psychiatr Genet 17:69–75
Zintzaras E, Hadjigeorgiou GM (2004) Association of paraoxonase 1 gene polymorphisms with risk of Parkinson’s disease: a meta-analysis. J Hum Genet 49:474–481
Zintzaras E, Hadjigeorgiou GM (2005) The role of G196A polymorphism in the brain-derived neurotrophic factor gene in the cause of Parkinson’s disease: a meta-analysis. J Hum Genet 50:560–566
Zintzaras E, Ioannidis JP (2005) Heterogeneity testing in meta-analysis of genome searches. Genet Epidemiol 28:123–137
Zintzaras E, Ioannidis JPA (2007) Meta-analysis for ranked discovery datasets: theoretical framework and empirical demonstration for microarrays. Comput Biol Chem. doi:https://doi.org/10.1016/j.compbiolchem.2007.09.003
Zintzaras E, Lau J (2007) Synthesizing genetic association studies: methodological and statistical considerations. J Clin Epidemiol (in press)
Zintzaras E, Stefanidis I (2005) Association between the GLUT1 gene polymorphism and the risk of diabetic nephropathy: a meta-analysis. J Hum Genet 50:84–91
Zintzaras E, Sakelaridis N (2007) Is 472G/A catechol-O-methyl-transferase gene polymorphism related to panic disorder? Psychiatr Genet 17:267–273
Zintzaras E, Kitsios G, Stefanidis I (2006a) Endothelial NO synthase gene polymorphisms and hypertension: a meta-analysis. Hypertension 48:700–710
Zintzaras E, Stefanidis I, Santos M, Vidal F (2006b) Do alcohol-metabolizing enzyme gene polymorphisms increase the risk of alcoholism and alcoholic liver disease? Hepatology 43:352–361
Zintzaras E, Uhlig K, Koukoulis GN, Papathanasiou AA, Stefanidis I (2007) Methylenetetrahydrofolate reductase gene polymorphism as a risk factor for diabetic nephropathy: a meta-analysis. J Hum Genet 52:881–890
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zintzaras, E., Lau, J. Trends in meta-analysis of genetic association studies. J Hum Genet 53, 1–9 (2008). https://doi.org/10.1007/s10038-007-0223-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10038-007-0223-5
Keywords
This article is cited by
-
MTHFR gene polymorphism and risk of myeloid leukemia: a meta-analysis
Tumor Biology (2014)
-
Genetic variance in Nitric Oxide Synthase and Endothelin Genes among children with and without Endothelial Dysfunction
Journal of Translational Medicine (2013)
-
Molecular genetics of obsessive–compulsive disorder: a comprehensive meta-analysis of genetic association studies
Molecular Psychiatry (2013)
-
Polymorphisms and haplotypes in MyD88 are associated with the development of sarcoidosis: a candidate-gene association study
Molecular Biology Reports (2013)
-
Lack of association between the UDP-glucuronosyltransferase 1A1 (UGT1A1) gene polymorphism and the risk of benign prostatic hyperplasia in Caucasian men
Molecular Biology Reports (2013)