Validating, augmenting and refining genome-wide association signals

Ioannidis, John P. A.; Thomas, Gilles; Daly, Mark J.

doi:10.1038/nrg2544

Review Article
Published: May 2009

Validating, augmenting and refining genome-wide association signals

John P. A. Ioannidis^1,2,
Gilles Thomas^3,4 &
Mark J. Daly^5,6

Nature Reviews Genetics volume 10, pages 318–329 (2009)Cite this article

5536 Accesses
294 Citations
34 Altmetric
Metrics details

Key Points

Genome-wide association studies have yielded a large number of association signals with robust statistical support, but these are only markers of the true functional variants.
Reliable identification of the true functional variants can be notoriously difficult, but a series of methods could be helpful in this regard.
Large-scale exact replication to achieve robust statistical credibility of a marker should precede efforts at finding the causative variants.
Fine mapping and resequencing might help to identify more informative markers and multiple independent informative loci.
Functional information could fine tune the credibility of different variants for being the causative variant.
Additional insights might be obtained by more extensive phenotype mapping of proposed variants.

Abstract

Studies using genome-wide platforms have yielded an unprecedented number of promising signals of association between genomic variants and human traits. This Review addresses the steps required to validate, augment and refine such signals to identify underlying causal variants for well-defined phenotypes. These steps include: large-scale exact replication across both similar and diverse populations; fine mapping and resequencing; determination of the most informative markers and multiple independent informative loci; incorporation of functional information; and improved phenotype mapping of the implicated genetic effects. Even in cases for which replication proves that an effect exists, confident localization of the causal variant often remains elusive.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

Refining the impact of genetic evidence on clinical success

Article Open access 17 April 2024

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Genome-wide association studies

Article 26 August 2021

References

McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008). A comprehensive review of challenges in the discovery of associations using GWA studies.
Article CAS PubMed Google Scholar
Manolio, T. A., Brooks, L. D. & Collins, F. S. A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 118, 1590–1605 (2008).
Article CAS PubMed PubMed Central Google Scholar
Janssens, A. C. & van Duijn, C. M. Genome-based prediction of common diseases: advances and prospects. Hum. Mol. Genet. 17, R166–R173 (2008).
Article CAS PubMed Google Scholar
Hoggart, C. J., Clark, T. G., De Iorio, M., Whittaker, J. C. & Balding, D. J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).
Article PubMed Google Scholar
Pe'er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).
Article PubMed Google Scholar
Clarke, G. M., Carter, K. W., Palmer, L. J., Morris, A. P. & Cardon, L. R. Fine mapping versus replication in whole-genome association studies. Am. J. Hum. Genet. 81, 995–1005 (2007).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L. A., Junkins, H. A., Mehta, J. P. & Manolio, T. A. A Catalog of Published Genome-Wide Association Studies. National Human Genome Research Institute [online] http://www.genome.gov/26525384, (2009). A continuously updated online list of GWA studies and their main results.
Google Scholar
Altshuler, D., Daly, M. J & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zeggini, E. & Ioannidis, J. P. A. Meta-analysis of genome-wide association studies. Pharmacogenomics 10, 191–201 (2009).
Article PubMed Google Scholar
de Bakker, P. I. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122–R128 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genet. 40, 638–645 (2008). An early paradigm of the application of meta-analysis in combining several GWA data sets and subsequent replication studies.
Article CAS PubMed Google Scholar
Barrett, J. C. et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genet. 40, 955–962 (2008).
Article CAS PubMed Google Scholar
The GIANT consortium. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nature Genet. 41, 25–34 (2009).
Seminara, D. et al. The emergence of networks in human genome epidemiology: challenges and opportunities. Epidemiology 18, 1–8 (2007).
Article PubMed Google Scholar
Pahl, R., Schäfer, H. & Müller, H. H. Optimal multistage designs—a general framework for efficient genome-wide association studies. Biostatistics 10, 297–309 (2009).
Article PubMed Google Scholar
Gail, M. H., Pfeiffer, R. M., Wheeler, W. & Pee, D. Probability that a two-stage genome-wide association study will detect a disease-associated SNP and implications for multistage designs. Ann. Hum. Genet. 72, 812–820 (2008).
Article CAS PubMed PubMed Central Google Scholar
Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genet. 38, 209–213 (2006).
Article CAS PubMed Google Scholar
Nothnagel, M., Ellinghaus, D., Schreiber, S., Krawczak, M. & Franke, A. A comprehensive evaluation of SNP genotype imputation. Hum. Genet. 125, 163–171 (2009).
Article CAS PubMed Google Scholar
Guan, Y. & Stephens, M. Practical issues in imputation-based association mapping. PLoS Genet. 4, e1000279 (2008).
Article PubMed PubMed Central CAS Google Scholar
Marchini, J. et al. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007).
Article CAS PubMed Google Scholar
Browning, S. R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124, 439–450 (2008).
Article CAS PubMed PubMed Central Google Scholar
Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).
Article CAS PubMed PubMed Central Google Scholar
Trikalinos, T. A., Salanti, G., Zintzaras, E. & Ioannidis, J. P. Meta-analysis methods. Adv. Genet. 60, 311–334 (2008).
Article PubMed Google Scholar
Kavvoura, F. K. & Ioannidis, J. P. Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Hum. Genet. 123, 1–14 (2008).
Article PubMed Google Scholar
Sutton, A. J., Abrams, K. R., Jones, D. R., Sheldon, T. A. & Song, F. Methods for Meta-Analysis in Medical Research (Wiley, Chichester, 2000).
Google Scholar
Sutton, A. J. & Higgins, J. P. Recent developments in meta-analysis. Stat. Med. 27, 625–650 (2008).
Article PubMed Google Scholar
Spiegelhalter, D. J., Abrams, K. R. & Myles, P. J. Bayesian Approaches to Clinical Trials and Health-Care Evaluation Ch. 8, 267–305 (Wiley, Chichester, 2004).
Book Google Scholar
Salanti, G., Higgins, J. P., Trikalinos, T. A. & Ioannidis, J. P. Bayesian meta-analysis and meta-regression for gene–disease associations and deviations from Hardy–Weinberg equilibrium. Stat. Med. 26, 553–567 (2007).
Article PubMed Google Scholar
Thorlund, K., et al. Can trial sequential monitoring boundaries reduce spurious inferences from meta-analyses? Int. J. Epidemiol. 38, 276–286 (2009).
Article PubMed Google Scholar
Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am. J. Hum. Genet. 80, 605–615 (2007). A thorough presentation of the winner's curse and of the proposed approach for correcting for it.
Article CAS PubMed PubMed Central Google Scholar
Ioannidis, J. P. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).
Article PubMed Google Scholar
Moonesinghe, R., Khoury, M. J., Liu, T. & Ioannidis, J. P. Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc. Natl Acad. Sci. USA 105, 617–622 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ioannidis, J. P., Patsopoulos, N. A. & Evangelou, E. Uncertainty in heterogeneity estimates in meta-analyses. BMJ 335, 914–916 (2007).
Article PubMed PubMed Central Google Scholar
Ioannidis, J. P. Non-replication and inconsistency in the genome-wide association setting. Hum. Hered. 64, 203–213 (2007).
Article CAS PubMed Google Scholar
Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).
Article CAS PubMed Google Scholar
Kavvoura, F. K. et al. Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer's disease. Am. J. Epidemiol. 168, 855–865 (2008).
Article PubMed PubMed Central Google Scholar
Slatkin, M. Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Rev. Genet. 9, 477–485 (2008).
Article CAS PubMed Google Scholar
International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008).
Article CAS PubMed Google Scholar
Ioannidis, J. P., Ntzani, E. E. & Trikalinos, T. A. 'Racial' differences in genetic effects for complex diseases. Nature Genet. 36, 1312–1318 (2004).
Article CAS PubMed Google Scholar
Easton, D. F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007).
Article CAS PubMed PubMed Central Google Scholar
Ng, M. C. et al. Implication of genetic variants near TCF7L2, SLC30A8, HHEX, CDKAL1, CDKN2A/B, IGF2BP2, and FTO in type 2 diabetes and obesity in 6,719 Asians. Diabetes 57, 2226–2233 (2008).
Article CAS PubMed PubMed Central Google Scholar
Gudbjartsson, D. F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).
Article CAS PubMed Google Scholar
Grant, S. F. et al. Association analysis of the FTO gene with obesity in children of Caucasian and African ancestry reveals a common tagging SNP. PLoS ONE 3, e1746 (2008).
Article PubMed PubMed Central CAS Google Scholar
Li, H. et al. Variants in the fat mass- and obesity-associated (FTO) gene are not associated with obesity in a Chinese Han population. Diabetes 57, 264–268 (2008).
Article CAS PubMed Google Scholar
Grant, S. F. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genet. 38, 320–323 (2006).
Article CAS PubMed Google Scholar
Helgason, A. et al. Refining the impact of TCF7L2 gene variants on type 2 diabetes and adaptive evolution. Nature Genet. 39, 218–225 (2007).
Article CAS PubMed Google Scholar
Terwilliger, J. D. & Hiekkalina, T. An utter refutation of the 'Fundamental Theorem of the HapMap'. Eur. J. Hum. Genet. 14, 426–437 (2006).
Article CAS PubMed Google Scholar
Thomas, D. & Stram, D. An utter refutation of the 'Fundamental Theorem of the HapMap' by Terwilliger and Hiekkalina. Eur. J. Hum. Genet. 14, 1238–1239 (2006).
Article PubMed Google Scholar
Nunnally, J. C. Introduction to Psychological Measurement (McGraw–Hill, New York, 1970).
Google Scholar
Nath, S. K. et al. A nonsynonymous functional variant in integrin-αM (encoded by ITGAM) is associated with systemic lupus erythematosus. Nature Genet. 40, 152–154 (2008).
Article CAS PubMed Google Scholar
Amundadottir, L. T. et al. A common variant associated with prostate cancer in European and African populations. Nature Genet. 38, 652–658 (2006).
Article CAS PubMed Google Scholar
Freedman, M. L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African–American men. Proc. Natl Acad. Sci. USA 103, 14068–14073 (2006).
Article CAS PubMed PubMed Central Google Scholar
Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature Genet. 39, 645–649 (2007).
Article CAS PubMed Google Scholar
Haiman, C. A. et al. Multiple regions within 8q24 independently affect risk for prostate cancer. Nature Genet. 39, 638–644 (2007).
Article CAS PubMed Google Scholar
Zanke, B. W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nature Genet. 39, 989–994 (2007).
Article CAS PubMed Google Scholar
Ghoussaini, M. et al. Multiple loci with different cancer specificities within the 8q24 gene desert. J. Natl. Cancer Inst. 100, 962–966 (2008).
Article CAS PubMed PubMed Central Google Scholar
Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nature Genet. 39, 631–637 (2007).
Article CAS PubMed Google Scholar
Kiemeney, L. A. et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nature Genet. 40, 1307–1312 (2008).
Article CAS PubMed Google Scholar
Wokolorczyk, D. et al. A range of cancers is associated with the rs6983267 marker on chromosome 8. Cancer Res. 68, 9982–9986 (2008).
Article CAS PubMed Google Scholar
Park, S. L. et al. Associations between variants of the 8q24 chromosome and nine smoking-related cancer sites. Cancer Epidemiol. Biomarkers Prev. 17, 3193–3202 (2008).
Article CAS PubMed PubMed Central Google Scholar
Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).
Article CAS PubMed PubMed Central Google Scholar
Veyrieras, J. B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).
Article PubMed PubMed Central CAS Google Scholar
Petretto, E. et al. Heritability and tissue specificity of expression quantitative trait loci. PLoS Genet. 2, e172 (2006).
Article PubMed PubMed Central CAS Google Scholar
Libouille, C. et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 3, e58 (2007).
Article CAS Google Scholar
International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007). A description of the second generation of the HapMap.
Voelkerding, K. V., Dames, S. A. & Durtschi, J. D. Next-generation sequencing: from basic research to diagnostics. Clin. Chem. 26 Feb 2009 (doi:10.1373/clinchem.2008.112789).
Article CAS PubMed Google Scholar
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Article CAS PubMed PubMed Central Google Scholar
Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).
Article CAS PubMed PubMed Central Google Scholar
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).
Article CAS PubMed Google Scholar
Lin, D. Y. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21, 781–787 (2005).
Article CAS PubMed Google Scholar
McCarroll, S. A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nature Genet. 40, 1107–1120 (2008).
Article CAS PubMed Google Scholar
Gorlov, I. P., Gorlova, O. Y., Sunyaev, S. R., Spitz, M. R. & Amos, C. I. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112 (2008).
Article CAS PubMed PubMed Central Google Scholar
Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007).
Article CAS PubMed PubMed Central Google Scholar
Yeo, G. S. et al. Mutations in the human melanocortin-4 receptor gene associated with severe familial obesity disrupts receptor function through multiple molecular mechanisms. Hum. Mol. Genet. 12, 561–574 (2003).
Article CAS PubMed Google Scholar
Cohen, J. C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).
Article CAS PubMed Google Scholar
Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).
Article CAS PubMed Google Scholar
Harrell, F. E. Jr, Lee, K. L. & Mark, D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15, 361–387 (1996).
Article PubMed Google Scholar
Stephens, M. & Donnelly, P. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73, 1162–1169 (2003).
Article CAS PubMed PubMed Central Google Scholar
Graham, R. R. et al. Three functional variants of IFN regulatory factor 5 (IRF5) define risk and protective haplotypes for human lupus. Proc. Natl Acad. Sci. USA 104, 6758–6763 (2007).
Article CAS PubMed PubMed Central Google Scholar
Sigurdsson, S. et al. Comprehensive evaluation of the genetic variants of interferon regulatory factor 5 (IRF5) reveals a novel 5 bp length polymorphism as strong risk factor for systemic lupus erythematosus. Hum. Mol. Genet. 17, 872–881 (2008).
Article CAS PubMed Google Scholar
Shin, H. D. et al. Different genetic effects of interferon regulatory factor 5 (IRF5) polymorphisms on systemic lupus erythematosus in a Korean population. J. Rheumatol. 35, 2148–2151 (2008).
Article CAS PubMed Google Scholar
Kawasaki, A. et al. Association of IRF5 polymorphisms with systemic lupus erythematosus in a Japanese population: support for a crucial role of intron 1 polymorphisms. Arthritis Rheum. 58, 826–834 (2008).
Article CAS PubMed Google Scholar
Li, M. et al. CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration. Nature Genet. 38, 1049–1054 (2006).
Article CAS PubMed Google Scholar
Maller, J. et al. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nature Genet. 38, 1055–1059 (2006).
Article CAS PubMed Google Scholar
Mori, K. et al. Coding and noncoding variants in the CFH gene and cigarette smoking influence the risk of age-related macular degeneration in a Japanese population. Invest. Ophthalmol. Vis. Sci. 48, 5315–5319 (2007).
Article PubMed Google Scholar
Minelli, C., Thompson, J. R., Abrams, K. R. & Lambert, P. C. Bayesian implementation of a genetic model-free approach to the meta-analysis of genetic association studies. Stat. Med. 24, 3845–3861 (2005).
Article PubMed Google Scholar
Risch, N. & Botstein, D. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genet. 33 (Suppl.), 228–237 (2003).
PubMed Google Scholar
Warner, J. B. et al. Systematic identification of mammalian regulatory motifs' target genes and function. Nature Methods 5, 347–353 (2008).
Article CAS PubMed PubMed Central Google Scholar
Tompa, M. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnol. 23, 137–144 (2005).
Article CAS Google Scholar
Kariuki, S. N. et al. Autoimmune disease risk variant of STAT4 confers increased sensitivity to IFN-α in lupus patients in vivo. J. Immunol. 182, 34–38 (2009).
Article CAS PubMed Google Scholar
Kuballa, P., Huett, A., Rioux, J. D., Daly, M. J. & Xavier, R. Impaired autophagy of an intracellular pathogen induced by a Crohn's disease associated ATG16L1 variant. PLoS ONE 3, e3391 (2008).
Article PubMed PubMed Central CAS Google Scholar
Ogura, Y. et al. Genetic variation and activity of mouse Nod2, a susceptibility gene for Crohn's disease. Genomics 81, 369–377 (2003).
Article CAS PubMed Google Scholar
Shen S. et al. Schizophrenia-related neural and behavioural phenotypes in transgenic mice expressing truncated Disc1. J. Neurosci. 28, 10893–10904 (2008).
Article CAS PubMed PubMed Central Google Scholar
Ioannidis J. P. & Kavvoura F. K. Concordance of functional in vitro data and epidemiological associations in complex disease genetics. Genet. Med. 8, 583–593 (2006).
Article PubMed Google Scholar
Martin, L. J. et al. Phenotypic, genetic, and genome-wide structure in the metabolic syndrome. BMC Genet. 4 (Suppl. 1), S95 (2003).
Article PubMed PubMed Central Google Scholar
Aukes, M. F. et al. Genetic overlap among intelligence and other candidate endophenotypes for schizophrenia. Biol. Psychiatry. 65, 527–534 (2009).
Article CAS PubMed Google Scholar
Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).
Article CAS PubMed PubMed Central Google Scholar
Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).
Article CAS PubMed PubMed Central Google Scholar
Ioannidis, J. P., Patsopoulos, N. A. & Evangelou, E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE 2, e841 (2007).
Article PubMed PubMed Central CAS Google Scholar
Toulopoulou, T. et al. Substantial genetic overlap between neurocognition and schizophrenia: genetic modeling in twin samples. Arch. Gen. Psychiatry 64, 1348–1355 (2007).
Article PubMed Google Scholar
Bottini, N., Vang, T., Cucca, F. & Mustelin, T. Role of PTPN22 in type 1 diabetes and other autoimmune diseases. Semin. Immunol. 18, 207–213 (2006).
Article CAS PubMed Google Scholar
Kavvoura, F. K. et al. Cytotoxic T-lymphocyte associated antigen 4 gene polymorphisms and autoimmune thyroid disease: a meta-analysis. J. Clin. Endocrinol. Metab. 92, 3162–3170 (2007).
Article CAS PubMed Google Scholar
Kavvoura, F. K. & Ioannidis, J. P. CTLA-4 gene polymorphisms and susceptibility to type 1 diabetes mellitus: a HuGE Review and meta-analysis. Am. J. Epidemiol. 162, 3–16 (2005).
Article PubMed Google Scholar
Gudmundsson, J. et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genet. 39, 977–983 (2007).
Article CAS PubMed Google Scholar
Orho-Melander, M. et al. A common missense variant in the glucokinase regulatory protein gene (GCKR) is associated with increased plasma triglyceride and C-reactive protein but lower fasting glucose concentrations. Diabetes 57, 3112–3121 (2008).
Article CAS PubMed PubMed Central Google Scholar
Wojczynski, M. K. & Tiwari, H. K. Definition of phenotype. Adv. Genet. 60, 75–105 (2008).
Article PubMed Google Scholar
Viswesvaran, C. & Ones, D. S. Measurement error in “Big Five Factors” personality assessment: reliability generalization across studies and measures. Educ. Psychol. Meas. 60, 224–235 (2000).
Article Google Scholar
Dina, C. et al. Variation in FTO contributes to childhood obesity and severe adult obesity. Nature Genet. 39, 724–726 (2007).
Article CAS PubMed Google Scholar
Contopoulos-Ioannidis, D. G., Alexiou, G. A., Gouvias, T. C. & Ioannidis, J. P. An empirical evaluation of multifarious outcomes in pharmacogenetics: β2 adrenoceptor gene polymorphisms in asthma treatment. Pharmacogenet. Genomics 16, 705–711 (2006).
Article CAS PubMed Google Scholar
Goh, K. I. et al. The human disease network. Proc. Natl Acad. Sci. USA 104, 8685–8690 (2007).
Article CAS PubMed PubMed Central Google Scholar
Lage, K. et al. A human phenome–interactome network of protein complexes implicated in genetic disorders. Nature Biotechnol. 25, 309–316 (2007).
Article CAS Google Scholar
van Driel, M. A., Bruggeman, J., Vriend, G., Brunner, H. G. & Leunissen, J. A. A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14, 535–542 (2006).
Article CAS PubMed Google Scholar
Wild, C. P. Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev. 14, 1847–1850 (2005).
Article CAS PubMed Google Scholar
Garcia-Closas, M. et al. Heterogeneity of breast cancer associations with five susceptibility loci by clinical and pathological characteristics. PLoS Genet. 4, e1000054 (2008).
Article PubMed PubMed Central CAS Google Scholar
NCI–NHGRI Working Group on Replication in Association Studies. Replicating genotype–phenotype associations. Nature 447, 655–660 (2007).
Ioannidis, J. P. Molecular evidence-based medicine: evolution and integration of information in the genomic era. Eur. J. Clin. Invest. 37, 340–349 (2007).
Article CAS PubMed Google Scholar
Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).
Article CAS PubMed Google Scholar
GAIN Collaborative Research Group. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nature Genet. 39, 1045–1051 (2007).

Download references

Acknowledgements

Scientific support for this project was provided through the Tufts Clinical and Translational Science Institute (Tufts CTSI) under funding from the National Institute of Health/National Center for Research Resources (UL1 RR025752 ). Points of view or opinions in this paper are those of the authors and do not necessarily represent the official position or policies of the Tufts CTSI.

Author information

Authors and Affiliations

Department of Hygiene and Epidemiology, Clinical and Molecular Epidemiology Unit, University of Ioannina School of Medicine and Biomedical Research Institute, Foundation for Research and Technology — Hellas, Ioannina, 45110, Greece
John P. A. Ioannidis
Center for Genetic Epidemiology and Modelling, Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, and Tufts Clinical and Translational Science Institute, Boston, Tufts University School of Medicine, Boston, 02111, Massachusetts, USA
John P. A. Ioannidis
Division of Cancer Epidemiology and Genetics, Department of Health and Human Services, National Cancer Institute, National Institutes of Health, Bethesda, 20892, Maryland, USA
Gilles Thomas
Fondation Synergie, INSERM U590, Centre Léon Bérard, 28 Rue Laënnec, Lyon, 69373, Cedex 08, France
Gilles Thomas
Center for Human Genetic Research, Massachusetts General Hospital, Richard B. Simches Research Center, Boston, 02114, Massachusetts, USA
Mark J. Daly
The Broad Institute of Harvard and MIT, Cambridge, 02142, Massachusetts, USA
Mark J. Daly

Authors

John P. A. Ioannidis
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Daly
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John P. A. Ioannidis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Glossary

Pleiotropy: The effect of a gene on more than one phenotype or disease.
Meta-analysis: An analysis that combines the evidence from multiple data sets.
Odds ratio: A measurement of association that is commonly used in case–control studies. It is defined as the odds of exposure to the susceptible genetic variant in cases compared with the odds of exposure in controls. If the odds ratio is significantly greater than one, then the genetic variant is associated with the disease.
Cochran–Armitage test: A genotype-based contingency table test for association that is well suited to the detection of trends across ordinal categories (in this case, genotypes).
r²: (Correlation coefficient). For linkage disequilibrium, it provides a measure of the strength and direction of a linear relationship between the genotypes of two variants expressed as a number of minor alleles.
Proxy: A highly correlated DNA variant that is an adequate substitute in an association study.
Detection probability: For a two-stage design, this is the probability that a disease-associated SNP will have a p value among the lowest ranks of p values at stage 1 and, among those SNPs selected at stage 1, that a disease-associated SNP will also have a p value among the lowest ranks of p values at stage 2.
Hardy–Weinberg equilibrium: A theoretical description of the relationship between genotype and allele frequencies that is based on an expectation in a stable population undergoing random mating in the absence of selection, new mutations and gene flow. Under these conditions, and in the absence of linkage disequilibrium, the genotype frequencies are equal to the product of the allele frequencies.
Imputation accuracy: This describes the different ways to treat missing genotypes in a data set. Imputed genotypes with less than a pre-specified accuracy can be considered missing or genotypes can be weighted in the calculations on the basis of the estimated imputation accuracy.
Population stratification: The situation that arises when a population contains several subpopulations that differ in their genetic characteristics.
Frequentist: A statistical approach for assessing whether a hypothesis is correct or an alternative should be adopted.
Markov chain Monte Carlo: An iterative computational approach for identifying the most likely model among many possible models.
Phasing: The determination of the haplotype phase (the arrangement of alleles at two loci on homologous chromosomes) from genotype data using statistical methods.
Winner's curse: The inflation of effect sizes compared with the true effect size for associations that are discovered on the basis of passing specific statistical significance or other selection thresholds.
I²: A metric of between-study heterogeneity taking values between 0 and 100%, which describes how much of the between-study heterogeneity is beyond chance.
Fixed effects model: A set of methods for combining data that assumes there is a common effect in all data sets and that observed effects only differ by chance.
Random effects model: A set of methods for combining data that assumes that genetic effects are different across different populations.
Phenotype misclassification: This describes the situation in which cases are classified as controls or controls are classified as cases for binary outcomes. The equivalent problem for continuous traits is measurement error.
Nested case–control: A design in which cases and controls are sampled from a pre-existing larger cohort.
Convenience sample: A sample of controls or of cases with a trait of interest that is available for another purpose and has not been collected for the purpose of the specific research project or with an explicit sampling scheme.
Principal components analysis: A statistical method used to simplify data sets by transforming a series of correlated variables into a smaller number of uncorrelated factors.
Copy number variant: A class of DNA sequence variants (including deletions and duplications) that lead to a departure from the expected diploid representation of DNA sequence.
Recombination hot spot: A small (usually one to a few kilobases) chromosomal region in which the frequency of meiotic recombination is much higher than average. Hot spots of recombination can be recognized by observing that all pairs of SNPs that encompass the region have a low D′ value.
Gene desert: A stretch of the genome that contains no known protein-coding gene.
Expression quantitative trait locus: A locus at which genetic allelic variation is associated with variation in gene expression.
Bayes factor: The ratio of the prior probabilities of the null hypothesis compared with the alternative hypotheses over the ratio of the posterior probabilities. This can be interpreted as the relative odds that the hypothesis is true before and after examining the data.
Regression model: A model that evaluates the association between one or multiple variables with an outcome of interest.
Overfitting: In a regression model, the tendency to obtain better fit to the available data than to other independent data.
Bayesian method: Any approach that uses a combination of prior beliefs and observed data to generate posterior beliefs.
Endophenotype: A physiological or other trait that is related to a disease trait and is measured independently of the disease.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ioannidis, J., Thomas, G. & Daly, M. Validating, augmenting and refining genome-wide association signals. Nat Rev Genet 10, 318–329 (2009). https://doi.org/10.1038/nrg2544

Download citation

Issue Date: May 2009
DOI: https://doi.org/10.1038/nrg2544

This article is cited by

Phenome-wide association study on miRNA-related sequence variants: the UK Biobank
- Rima Mustafa
- Mohsen Ghanbari
- Abbas Dehghan
Human Genomics (2023)
Functional genomics identify causal variant underlying the protective CTSH locus for Alzheimer’s disease
- Yu Li
- Min Xu
- Yong-Gang Yao
Neuropsychopharmacology (2023)
Associating complex traits with genetic variants: polygenic risk scores, pleiotropy and endophenotypes
- Gene S. Fisch
Genetica (2022)
LINC01149 variant modulates MICA expression that facilitates hepatitis B virus spontaneous recovery but increases hepatocellular carcinoma risk
- Rong Zhong
- Jianbo Tian
- Xiaoping Miao
Oncogene (2020)
Genetic and metabolic signatures of Salmonella enterica subsp. enterica associated with animal sources at the pangenomic scale
- Meryl Vila Nova
- Kévin Durimel
- Nicolas Radomski
BMC Genomics (2019)

Validating, augmenting and refining genome-wide association signals

Key Points

Abstract

Access options

Similar content being viewed by others

Refining the impact of genetic evidence on clinical success

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Genome-wide association studies

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

DATABASES

OMIM

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

This article is cited by

Phenome-wide association study on miRNA-related sequence variants: the UK Biobank

Functional genomics identify causal variant underlying the protective CTSH locus for Alzheimer’s disease

Associating complex traits with genetic variants: polygenic risk scores, pleiotropy and endophenotypes

LINC01149 variant modulates MICA expression that facilitates hepatitis B virus spontaneous recovery but increases hepatocellular carcinoma risk

Genetic and metabolic signatures of Salmonella enterica subsp. enterica associated with animal sources at the pangenomic scale

Search

Quick links

Key Points

Abstract

Access options

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing interests

Related links

Related links

DATABASES

OMIM

FURTHER INFORMATION

Glossary

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links