Validating, augmenting and refining genome-wide association signals

Key Points

  • Genome-wide association studies have yielded a large number of association signals with robust statistical support, but these are only markers of the true functional variants.

  • Reliable identification of the true functional variants can be notoriously difficult, but a series of methods could be helpful in this regard.

  • Large-scale exact replication to achieve robust statistical credibility of a marker should precede efforts at finding the causative variants.

  • Fine mapping and resequencing might help to identify more informative markers and multiple independent informative loci.

  • Functional information could fine tune the credibility of different variants for being the causative variant.

  • Additional insights might be obtained by more extensive phenotype mapping of proposed variants.


Studies using genome-wide platforms have yielded an unprecedented number of promising signals of association between genomic variants and human traits. This Review addresses the steps required to validate, augment and refine such signals to identify underlying causal variants for well-defined phenotypes. These steps include: large-scale exact replication across both similar and diverse populations; fine mapping and resequencing; determination of the most informative markers and multiple independent informative loci; incorporation of functional information; and improved phenotype mapping of the implicated genetic effects. Even in cases for which replication proves that an effect exists, confident localization of the causal variant often remains elusive.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Putting it in order.


  1. 1

    McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008). A comprehensive review of challenges in the discovery of associations using GWA studies.

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Manolio, T. A., Brooks, L. D. & Collins, F. S. A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 118, 1590–1605 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Janssens, A. C. & van Duijn, C. M. Genome-based prediction of common diseases: advances and prospects. Hum. Mol. Genet. 17, R166–R173 (2008).

    CAS  Google Scholar 

  4. 4

    Hoggart, C. J., Clark, T. G., De Iorio, M., Whittaker, J. C. & Balding, D. J. Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32, 179–185 (2008).

    PubMed  Google Scholar 

  5. 5

    Pe'er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).

    Google Scholar 

  6. 6

    Clarke, G. M., Carter, K. W., Palmer, L. J., Morris, A. P. & Cardon, L. R. Fine mapping versus replication in whole-genome association studies. Am. J. Hum. Genet. 81, 995–1005 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. 7

    Hindorff, L. A., Junkins, H. A., Mehta, J. P. & Manolio, T. A. A Catalog of Published Genome-Wide Association Studies. National Human Genome Research Institute [online], (2009). A continuously updated online list of GWA studies and their main results.

    Google Scholar 

  8. 8

    Altshuler, D., Daly, M. J & Lander, E. S. Genetic mapping in human disease. Science 322, 881–888 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Zeggini, E. & Ioannidis, J. P. A. Meta-analysis of genome-wide association studies. Pharmacogenomics 10, 191–201 (2009).

    PubMed  PubMed Central  Google Scholar 

  10. 10

    de Bakker, P. I. et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122–R128 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Zeggini, E. et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genet. 40, 638–645 (2008). An early paradigm of the application of meta-analysis in combining several GWA data sets and subsequent replication studies.

    CAS  PubMed  Google Scholar 

  12. 12

    Barrett, J. C. et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genet. 40, 955–962 (2008).

    CAS  PubMed  Google Scholar 

  13. 13

    The GIANT consortium. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nature Genet. 41, 25–34 (2009).

  14. 14

    Seminara, D. et al. The emergence of networks in human genome epidemiology: challenges and opportunities. Epidemiology 18, 1–8 (2007).

    PubMed  PubMed Central  Google Scholar 

  15. 15

    Pahl, R., Schäfer, H. & Müller, H. H. Optimal multistage designs—a general framework for efficient genome-wide association studies. Biostatistics 10, 297–309 (2009).

    PubMed  Google Scholar 

  16. 16

    Gail, M. H., Pfeiffer, R. M., Wheeler, W. & Pee, D. Probability that a two-stage genome-wide association study will detect a disease-associated SNP and implications for multistage designs. Ann. Hum. Genet. 72, 812–820 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. 17

    Skol, A. D., Scott, L. J., Abecasis, G. R. & Boehnke, M. Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genet. 38, 209–213 (2006).

    CAS  Google Scholar 

  18. 18

    Nothnagel, M., Ellinghaus, D., Schreiber, S., Krawczak, M. & Franke, A. A comprehensive evaluation of SNP genotype imputation. Hum. Genet. 125, 163–171 (2009).

    CAS  PubMed  Google Scholar 

  19. 19

    Guan, Y. & Stephens, M. Practical issues in imputation-based association mapping. PLoS Genet. 4, e1000279 (2008).

    PubMed  PubMed Central  Google Scholar 

  20. 20

    Marchini, J. et al. A new multipoint method for genome-wide association studies by imputation of genotypes. Nature Genet. 39, 906–913 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Browning, S. R. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124, 439–450 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. 22

    Browning, S. R. & Browning, B. L. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–1097 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Trikalinos, T. A., Salanti, G., Zintzaras, E. & Ioannidis, J. P. Meta-analysis methods. Adv. Genet. 60, 311–334 (2008).

    PubMed  Google Scholar 

  24. 24

    Kavvoura, F. K. & Ioannidis, J. P. Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Hum. Genet. 123, 1–14 (2008).

    PubMed  Google Scholar 

  25. 25

    Sutton, A. J., Abrams, K. R., Jones, D. R., Sheldon, T. A. & Song, F. Methods for Meta-Analysis in Medical Research (Wiley, Chichester, 2000).

    Google Scholar 

  26. 26

    Sutton, A. J. & Higgins, J. P. Recent developments in meta-analysis. Stat. Med. 27, 625–650 (2008).

    PubMed  Google Scholar 

  27. 27

    Spiegelhalter, D. J., Abrams, K. R. & Myles, P. J. Bayesian Approaches to Clinical Trials and Health-Care Evaluation Ch. 8, 267–305 (Wiley, Chichester, 2004).

    Google Scholar 

  28. 28

    Salanti, G., Higgins, J. P., Trikalinos, T. A. & Ioannidis, J. P. Bayesian meta-analysis and meta-regression for gene–disease associations and deviations from Hardy–Weinberg equilibrium. Stat. Med. 26, 553–567 (2007).

    PubMed  Google Scholar 

  29. 29

    Thorlund, K., et al. Can trial sequential monitoring boundaries reduce spurious inferences from meta-analyses? Int. J. Epidemiol. 38, 276–286 (2009).

    PubMed  Google Scholar 

  30. 30

    Zollner, S. & Pritchard, J. K. Overcoming the winner's curse: estimating penetrance parameters from case–control data. Am. J. Hum. Genet. 80, 605–615 (2007). A thorough presentation of the winner's curse and of the proposed approach for correcting for it.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Ioannidis, J. P. Why most discovered true associations are inflated. Epidemiology 19, 640–648 (2008).

    PubMed  PubMed Central  Google Scholar 

  32. 32

    Moonesinghe, R., Khoury, M. J., Liu, T. & Ioannidis, J. P. Required sample size and nonreplicability thresholds for heterogeneous genetic associations. Proc. Natl Acad. Sci. USA 105, 617–622 (2008).

    CAS  PubMed  Google Scholar 

  33. 33

    Ioannidis, J. P., Patsopoulos, N. A. & Evangelou, E. Uncertainty in heterogeneity estimates in meta-analyses. BMJ 335, 914–916 (2007).

    PubMed  PubMed Central  Google Scholar 

  34. 34

    Ioannidis, J. P. Non-replication and inconsistency in the genome-wide association setting. Hum. Hered. 64, 203–213 (2007).

    CAS  PubMed  Google Scholar 

  35. 35

    Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  36. 36

    Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).

    CAS  PubMed  Google Scholar 

  37. 37

    Kavvoura, F. K. et al. Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer's disease. Am. J. Epidemiol. 168, 855–865 (2008).

    PubMed  PubMed Central  Google Scholar 

  38. 38

    Slatkin, M. Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Rev. Genet. 9, 477–485 (2008).

    CAS  PubMed  Google Scholar 

  39. 39

    International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  40. 40

    McCarroll, S. A. et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008).

    CAS  PubMed  Google Scholar 

  41. 41

    Ioannidis, J. P., Ntzani, E. E. & Trikalinos, T. A. 'Racial' differences in genetic effects for complex diseases. Nature Genet. 36, 1312–1318 (2004).

    CAS  PubMed  Google Scholar 

  42. 42

    Easton, D. F. et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 447, 1087–1093 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. 43

    Ng, M. C. et al. Implication of genetic variants near TCF7L2, SLC30A8, HHEX, CDKAL1, CDKN2A/B, IGF2BP2, and FTO in type 2 diabetes and obesity in 6,719 Asians. Diabetes 57, 2226–2233 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Gudbjartsson, D. F. et al. Variants conferring risk of atrial fibrillation on chromosome 4q25. Nature 448, 353–357 (2007).

    CAS  Google Scholar 

  45. 45

    Grant, S. F. et al. Association analysis of the FTO gene with obesity in children of Caucasian and African ancestry reveals a common tagging SNP. PLoS ONE 3, e1746 (2008).

    PubMed  PubMed Central  Google Scholar 

  46. 46

    Li, H. et al. Variants in the fat mass- and obesity-associated (FTO) gene are not associated with obesity in a Chinese Han population. Diabetes 57, 264–268 (2008).

    CAS  PubMed  Google Scholar 

  47. 47

    Grant, S. F. et al. Variant of transcription factor 7-like 2 (TCF7L2) gene confers risk of type 2 diabetes. Nature Genet. 38, 320–323 (2006).

    CAS  Google Scholar 

  48. 48

    Helgason, A. et al. Refining the impact of TCF7L2 gene variants on type 2 diabetes and adaptive evolution. Nature Genet. 39, 218–225 (2007).

    CAS  PubMed  Google Scholar 

  49. 49

    Terwilliger, J. D. & Hiekkalina, T. An utter refutation of the 'Fundamental Theorem of the HapMap'. Eur. J. Hum. Genet. 14, 426–437 (2006).

    CAS  PubMed  Google Scholar 

  50. 50

    Thomas, D. & Stram, D. An utter refutation of the 'Fundamental Theorem of the HapMap' by Terwilliger and Hiekkalina. Eur. J. Hum. Genet. 14, 1238–1239 (2006).

    PubMed  Google Scholar 

  51. 51

    Nunnally, J. C. Introduction to Psychological Measurement (McGraw–Hill, New York, 1970).

    Google Scholar 

  52. 52

    Nath, S. K. et al. A nonsynonymous functional variant in integrin-αM (encoded by ITGAM) is associated with systemic lupus erythematosus. Nature Genet. 40, 152–154 (2008).

    CAS  PubMed  Google Scholar 

  53. 53

    Amundadottir, L. T. et al. A common variant associated with prostate cancer in European and African populations. Nature Genet. 38, 652–658 (2006).

    CAS  PubMed  Google Scholar 

  54. 54

    Freedman, M. L. et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African–American men. Proc. Natl Acad. Sci. USA 103, 14068–14073 (2006).

    CAS  Google Scholar 

  55. 55

    Yeager, M. et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nature Genet. 39, 645–649 (2007).

    CAS  PubMed  Google Scholar 

  56. 56

    Haiman, C. A. et al. Multiple regions within 8q24 independently affect risk for prostate cancer. Nature Genet. 39, 638–644 (2007).

    CAS  PubMed  Google Scholar 

  57. 57

    Zanke, B. W. et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nature Genet. 39, 989–994 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  58. 58

    Ghoussaini, M. et al. Multiple loci with different cancer specificities within the 8q24 gene desert. J. Natl. Cancer Inst. 100, 962–966 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Gudmundsson, J. et al. Genome-wide association study identifies a second prostate cancer susceptibility variant at 8q24. Nature Genet. 39, 631–637 (2007).

    CAS  Google Scholar 

  60. 60

    Kiemeney, L. A. et al. Sequence variant on 8q24 confers susceptibility to urinary bladder cancer. Nature Genet. 40, 1307–1312 (2008).

    CAS  PubMed  Google Scholar 

  61. 61

    Wokolorczyk, D. et al. A range of cancers is associated with the rs6983267 marker on chromosome 8. Cancer Res. 68, 9982–9986 (2008).

    CAS  PubMed  Google Scholar 

  62. 62

    Park, S. L. et al. Associations between variants of the 8q24 chromosome and nine smoking-related cancer sites. Cancer Epidemiol. Biomarkers Prev. 17, 3193–3202 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. 63

    Xie, X. et al. Systematic discovery of regulatory motifs in human promoters and 3′ UTRs by comparison of several mammals. Nature 434, 338–345 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Veyrieras, J. B. et al. High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet. 4, e1000214 (2008).

    PubMed  PubMed Central  Google Scholar 

  65. 65

    Petretto, E. et al. Heritability and tissue specificity of expression quantitative trait loci. PLoS Genet. 2, e172 (2006).

    PubMed  PubMed Central  Google Scholar 

  66. 66

    Libouille, C. et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS Genet. 3, e58 (2007).

    Google Scholar 

  67. 67

    International HapMap Consortium. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007). A description of the second generation of the HapMap.

  68. 68

    Voelkerding, K. V., Dames, S. A. & Durtschi, J. D. Next-generation sequencing: from basic research to diagnostics. Clin. Chem. 26 Feb 2009 (doi:10.1373/clinchem.2008.112789).

    CAS  PubMed  Google Scholar 

  69. 69

    Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  70. 70

    Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity 95, 221–227 (2005).

    CAS  PubMed  Google Scholar 

  72. 72

    Lin, D. Y. An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21, 781–787 (2005).

    CAS  PubMed  Google Scholar 

  73. 73

    McCarroll, S. A. et al. Deletion polymorphism upstream of IRGM associated with altered IRGM expression and Crohn's disease. Nature Genet. 40, 1107–1120 (2008).

    CAS  PubMed  Google Scholar 

  74. 74

    Gorlov, I. P., Gorlova, O. Y., Sunyaev, S. R., Spitz, M. R. & Amos, C. I. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am. J. Hum. Genet. 82, 100–112 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Kryukov, G. V., Pennacchio, L. A. & Sunyaev, S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am. J. Hum. Genet. 80, 727–739 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Yeo, G. S. et al. Mutations in the human melanocortin-4 receptor gene associated with severe familial obesity disrupts receptor function through multiple molecular mechanisms. Hum. Mol. Genet. 12, 561–574 (2003).

    CAS  PubMed  Google Scholar 

  77. 77

    Cohen, J. C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).

    CAS  PubMed  Google Scholar 

  78. 78

    Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).

    CAS  PubMed  Google Scholar 

  79. 79

    Harrell, F. E. Jr, Lee, K. L. & Mark, D. B. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat. Med. 15, 361–387 (1996).

    Google Scholar 

  80. 80

    Stephens, M. & Donnelly, P. A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73, 1162–1169 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. 81

    Graham, R. R. et al. Three functional variants of IFN regulatory factor 5 (IRF5) define risk and protective haplotypes for human lupus. Proc. Natl Acad. Sci. USA 104, 6758–6763 (2007).

    CAS  PubMed  Google Scholar 

  82. 82

    Sigurdsson, S. et al. Comprehensive evaluation of the genetic variants of interferon regulatory factor 5 (IRF5) reveals a novel 5 bp length polymorphism as strong risk factor for systemic lupus erythematosus. Hum. Mol. Genet. 17, 872–881 (2008).

    CAS  PubMed  Google Scholar 

  83. 83

    Shin, H. D. et al. Different genetic effects of interferon regulatory factor 5 (IRF5) polymorphisms on systemic lupus erythematosus in a Korean population. J. Rheumatol. 35, 2148–2151 (2008).

    CAS  PubMed  Google Scholar 

  84. 84

    Kawasaki, A. et al. Association of IRF5 polymorphisms with systemic lupus erythematosus in a Japanese population: support for a crucial role of intron 1 polymorphisms. Arthritis Rheum. 58, 826–834 (2008).

    CAS  PubMed  Google Scholar 

  85. 85

    Li, M. et al. CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration. Nature Genet. 38, 1049–1054 (2006).

    CAS  Google Scholar 

  86. 86

    Maller, J. et al. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nature Genet. 38, 1055–1059 (2006).

    CAS  Google Scholar 

  87. 87

    Mori, K. et al. Coding and noncoding variants in the CFH gene and cigarette smoking influence the risk of age-related macular degeneration in a Japanese population. Invest. Ophthalmol. Vis. Sci. 48, 5315–5319 (2007).

    PubMed  Google Scholar 

  88. 88

    Minelli, C., Thompson, J. R., Abrams, K. R. & Lambert, P. C. Bayesian implementation of a genetic model-free approach to the meta-analysis of genetic association studies. Stat. Med. 24, 3845–3861 (2005).

    PubMed  Google Scholar 

  89. 89

    Risch, N. & Botstein, D. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genet. 33 (Suppl.), 228–237 (2003).

    PubMed  Google Scholar 

  90. 90

    Warner, J. B. et al. Systematic identification of mammalian regulatory motifs' target genes and function. Nature Methods 5, 347–353 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. 91

    Tompa, M. et al. Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotechnol. 23, 137–144 (2005).

    CAS  Google Scholar 

  92. 92

    Kariuki, S. N. et al. Autoimmune disease risk variant of STAT4 confers increased sensitivity to IFN-α in lupus patients in vivo. J. Immunol. 182, 34–38 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. 93

    Kuballa, P., Huett, A., Rioux, J. D., Daly, M. J. & Xavier, R. Impaired autophagy of an intracellular pathogen induced by a Crohn's disease associated ATG16L1 variant. PLoS ONE 3, e3391 (2008).

    PubMed  PubMed Central  Google Scholar 

  94. 94

    Ogura, Y. et al. Genetic variation and activity of mouse Nod2, a susceptibility gene for Crohn's disease. Genomics 81, 369–377 (2003).

    CAS  PubMed  Google Scholar 

  95. 95

    Shen S. et al. Schizophrenia-related neural and behavioural phenotypes in transgenic mice expressing truncated Disc1. J. Neurosci. 28, 10893–10904 (2008).

    CAS  PubMed  Google Scholar 

  96. 96

    Ioannidis J. P. & Kavvoura F. K. Concordance of functional in vitro data and epidemiological associations in complex disease genetics. Genet. Med. 8, 583–593 (2006).

    PubMed  Google Scholar 

  97. 97

    Martin, L. J. et al. Phenotypic, genetic, and genome-wide structure in the metabolic syndrome. BMC Genet. 4 (Suppl. 1), S95 (2003).

    PubMed  PubMed Central  Google Scholar 

  98. 98

    Aukes, M. F. et al. Genetic overlap among intelligence and other candidate endophenotypes for schizophrenia. Biol. Psychiatry. 65, 527–534 (2009).

    CAS  PubMed  Google Scholar 

  99. 99

    Zeggini, E. et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. 100

    Frayling, T. M. et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).

    CAS  PubMed  PubMed Central  Google Scholar 

  101. 101

    Ioannidis, J. P., Patsopoulos, N. A. & Evangelou, E. Heterogeneity in meta-analyses of genome-wide association investigations. PLoS ONE 2, e841 (2007).

    PubMed  PubMed Central  Google Scholar 

  102. 102

    Toulopoulou, T. et al. Substantial genetic overlap between neurocognition and schizophrenia: genetic modeling in twin samples. Arch. Gen. Psychiatry 64, 1348–1355 (2007).

    PubMed  Google Scholar 

  103. 103

    Bottini, N., Vang, T., Cucca, F. & Mustelin, T. Role of PTPN22 in type 1 diabetes and other autoimmune diseases. Semin. Immunol. 18, 207–213 (2006).

    CAS  PubMed  Google Scholar 

  104. 104

    Kavvoura, F. K. et al. Cytotoxic T-lymphocyte associated antigen 4 gene polymorphisms and autoimmune thyroid disease: a meta-analysis. J. Clin. Endocrinol. Metab. 92, 3162–3170 (2007).

    CAS  PubMed  Google Scholar 

  105. 105

    Kavvoura, F. K. & Ioannidis, J. P. CTLA-4 gene polymorphisms and susceptibility to type 1 diabetes mellitus: a HuGE Review and meta-analysis. Am. J. Epidemiol. 162, 3–16 (2005).

    PubMed  Google Scholar 

  106. 106

    Gudmundsson, J. et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genet. 39, 977–983 (2007).

    CAS  Google Scholar 

  107. 107

    Orho-Melander, M. et al. A common missense variant in the glucokinase regulatory protein gene (GCKR) is associated with increased plasma triglyceride and C-reactive protein but lower fasting glucose concentrations. Diabetes 57, 3112–3121 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. 108

    Wojczynski, M. K. & Tiwari, H. K. Definition of phenotype. Adv. Genet. 60, 75–105 (2008).

    PubMed  Google Scholar 

  109. 109

    Viswesvaran, C. & Ones, D. S. Measurement error in “Big Five Factors” personality assessment: reliability generalization across studies and measures. Educ. Psychol. Meas. 60, 224–235 (2000).

    Google Scholar 

  110. 110

    Dina, C. et al. Variation in FTO contributes to childhood obesity and severe adult obesity. Nature Genet. 39, 724–726 (2007).

    CAS  Google Scholar 

  111. 111

    Contopoulos-Ioannidis, D. G., Alexiou, G. A., Gouvias, T. C. & Ioannidis, J. P. An empirical evaluation of multifarious outcomes in pharmacogenetics: β2 adrenoceptor gene polymorphisms in asthma treatment. Pharmacogenet. Genomics 16, 705–711 (2006).

    CAS  PubMed  Google Scholar 

  112. 112

    Goh, K. I. et al. The human disease network. Proc. Natl Acad. Sci. USA 104, 8685–8690 (2007).

    CAS  Google Scholar 

  113. 113

    Lage, K. et al. A human phenome–interactome network of protein complexes implicated in genetic disorders. Nature Biotechnol. 25, 309–316 (2007).

    CAS  Google Scholar 

  114. 114

    van Driel, M. A., Bruggeman, J., Vriend, G., Brunner, H. G. & Leunissen, J. A. A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14, 535–542 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  115. 115

    Wild, C. P. Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomarkers Prev. 14, 1847–1850 (2005).

    CAS  PubMed  Google Scholar 

  116. 116

    Garcia-Closas, M. et al. Heterogeneity of breast cancer associations with five susceptibility loci by clinical and pathological characteristics. PLoS Genet. 4, e1000054 (2008).

    PubMed  PubMed Central  Google Scholar 

  117. 117

    NCI–NHGRI Working Group on Replication in Association Studies. Replicating genotype–phenotype associations. Nature 447, 655–660 (2007).

  118. 118

    Ioannidis, J. P. Molecular evidence-based medicine: evolution and integration of information in the genomic era. Eur. J. Clin. Invest. 37, 340–349 (2007).

    CAS  PubMed  Google Scholar 

  119. 119

    Mailman, M. D. et al. The NCBI dbGaP database of genotypes and phenotypes. Nature Genet. 39, 1181–1186 (2007).

    CAS  PubMed  Google Scholar 

  120. 120

    GAIN Collaborative Research Group. New models of collaboration in genome-wide association studies: the Genetic Association Information Network. Nature Genet. 39, 1045–1051 (2007).

Download references


Scientific support for this project was provided through the Tufts Clinical and Translational Science Institute (Tufts CTSI) under funding from the National Institute of Health/National Center for Research Resources (UL1 RR025752 ). Points of view or opinions in this paper are those of the authors and do not necessarily represent the official position or policies of the Tufts CTSI.

Author information



Corresponding author

Correspondence to John P. A. Ioannidis.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links



Crohn's disease


type 2 diabetes


John P. A. Ioannidis' homepage

1000 Genomes Project

NHGRI GWA studies catalogue

Nature Reviews Genetics Series on Genome-wide association studies



The effect of a gene on more than one phenotype or disease.


An analysis that combines the evidence from multiple data sets.

Odds ratio

A measurement of association that is commonly used in case–control studies. It is defined as the odds of exposure to the susceptible genetic variant in cases compared with the odds of exposure in controls. If the odds ratio is significantly greater than one, then the genetic variant is associated with the disease.

Cochran–Armitage test

A genotype-based contingency table test for association that is well suited to the detection of trends across ordinal categories (in this case, genotypes).


(Correlation coefficient). For linkage disequilibrium, it provides a measure of the strength and direction of a linear relationship between the genotypes of two variants expressed as a number of minor alleles.


A highly correlated DNA variant that is an adequate substitute in an association study.

Detection probability

For a two-stage design, this is the probability that a disease-associated SNP will have a p value among the lowest ranks of p values at stage 1 and, among those SNPs selected at stage 1, that a disease-associated SNP will also have a p value among the lowest ranks of p values at stage 2.

Hardy–Weinberg equilibrium

A theoretical description of the relationship between genotype and allele frequencies that is based on an expectation in a stable population undergoing random mating in the absence of selection, new mutations and gene flow. Under these conditions, and in the absence of linkage disequilibrium, the genotype frequencies are equal to the product of the allele frequencies.

Imputation accuracy

This describes the different ways to treat missing genotypes in a data set. Imputed genotypes with less than a pre-specified accuracy can be considered missing or genotypes can be weighted in the calculations on the basis of the estimated imputation accuracy.

Population stratification

The situation that arises when a population contains several subpopulations that differ in their genetic characteristics.


A statistical approach for assessing whether a hypothesis is correct or an alternative should be adopted.

Markov chain Monte Carlo

An iterative computational approach for identifying the most likely model among many possible models.


The determination of the haplotype phase (the arrangement of alleles at two loci on homologous chromosomes) from genotype data using statistical methods.

Winner's curse

The inflation of effect sizes compared with the true effect size for associations that are discovered on the basis of passing specific statistical significance or other selection thresholds.


A metric of between-study heterogeneity taking values between 0 and 100%, which describes how much of the between-study heterogeneity is beyond chance.

Fixed effects model

A set of methods for combining data that assumes there is a common effect in all data sets and that observed effects only differ by chance.

Random effects model

A set of methods for combining data that assumes that genetic effects are different across different populations.

Phenotype misclassification

This describes the situation in which cases are classified as controls or controls are classified as cases for binary outcomes. The equivalent problem for continuous traits is measurement error.

Nested case–control

A design in which cases and controls are sampled from a pre-existing larger cohort.

Convenience sample

A sample of controls or of cases with a trait of interest that is available for another purpose and has not been collected for the purpose of the specific research project or with an explicit sampling scheme.

Principal components analysis

A statistical method used to simplify data sets by transforming a series of correlated variables into a smaller number of uncorrelated factors.

Copy number variant

A class of DNA sequence variants (including deletions and duplications) that lead to a departure from the expected diploid representation of DNA sequence.

Recombination hot spot

A small (usually one to a few kilobases) chromosomal region in which the frequency of meiotic recombination is much higher than average. Hot spots of recombination can be recognized by observing that all pairs of SNPs that encompass the region have a low D′ value.

Gene desert

A stretch of the genome that contains no known protein-coding gene.

Expression quantitative trait locus

A locus at which genetic allelic variation is associated with variation in gene expression.

Bayes factor

The ratio of the prior probabilities of the null hypothesis compared with the alternative hypotheses over the ratio of the posterior probabilities. This can be interpreted as the relative odds that the hypothesis is true before and after examining the data.

Regression model

A model that evaluates the association between one or multiple variables with an outcome of interest.


In a regression model, the tendency to obtain better fit to the available data than to other independent data.

Bayesian method

Any approach that uses a combination of prior beliefs and observed data to generate posterior beliefs.


A physiological or other trait that is related to a disease trait and is measured independently of the disease.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ioannidis, J., Thomas, G. & Daly, M. Validating, augmenting and refining genome-wide association signals. Nat Rev Genet 10, 318–329 (2009).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing