Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Genome-wide association studies: theoretical and practical concerns

Key Points

  • Recent improvements in genotyping technology and in our knowledge of human genetic variation have made it possible to carry out genome-wide genetic association studies to identify susceptibility genes for common disease. However, before such studies are undertaken, it is important to know what proportion of human genetic variation they are likely to survey and what the likely costs will be per true-positive result.

  • The allelic spectrum of complex diseases — that is, the range of the frequencies and effect sizes of susceptibility loci —will influence the success of genome-wide association studies. Large sample sizes will be required to search even the most accessible end of this spectrum: that is, alleles with population allele frequencies exceeding 0.1 and with odds ratios of 1.3 and above.

  • The existence of extensive regions of linkage disequilibrium in the human genome will greatly reduce the cost of genotyping in genome-wide association studies, as this allows the use of tag SNPs that provide information on a number of other SNPs that are not directly genotyped.

  • However, as much as 30% of the common variants in the genome remains unknown and the latest high-throughput technologies convert only about 50% of SNPs into robust assays. In addition, extensive resequencing and genotyping will have to be carried out to ensure that SNPs in regions of low linkage disequilibrium are surveyed comprehensively.

  • Other factors that need to be taken into account in the design and execution of initial genome-wide association studies, to avoid loss of statistical power, include possible selection bias, population substructure and misclassification errors.

Abstract

To fully understand the allelic variation that underlies common diseases, complete genome sequencing for many individuals with and without disease is required. This is still not technically feasible. However, recently it has become possible to carry out partial surveys of the genome by genotyping large numbers of common SNPs in genome-wide association studies. Here, we outline the main factors — including models of the allelic architecture of common diseases, sample size, map density and sample-collection biases — that need to be taken into account in order to optimize the cost efficiency of identifying genuine disease-susceptibility loci.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Effects of allele frequency on sample-size requirements.
Figure 2: Models of the risks conferred by disease-associated variants.
Figure 3: Possible allelic spectra of human diseases.
Figure 4: Putative distribution of phenotypic-effect sizes among disease-susceptibility variants.

Similar content being viewed by others

References

  1. Ioannidis, J. P., Trikalinos, T. A., Ntzani, E. E. & Contopoulos-Ioannidis, D. G. Genetic associations in large versus small studies: an empirical assessment. Lancet 361, 567–571 (2003).

    PubMed  Google Scholar 

  2. Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003).

    CAS  PubMed  Google Scholar 

  3. Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108 (2005). A review of the issues that are involved in the design of large-scale association mapping, including marker selection and sources of false-positive and false-negative results.

    CAS  PubMed  Google Scholar 

  4. Livak, K. J., Marmaro, J. & Todd, J. A. Towards fully automated genome-wide polymorphism screening. Nature Genet. 9, 341–342 (1995).

    CAS  PubMed  Google Scholar 

  5. Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001).

    CAS  PubMed  Google Scholar 

  6. Syvanen, A. C. Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature Rev. Genet. 2, 930–942 (2001).

    CAS  PubMed  Google Scholar 

  7. Miller, R. D., Duan, S., Lovins, E. G., Kloss, E. F. & Kwok, P. Y. Efficient high-throughput resequencing of genomic DNA. Genome Res. 13, 717–720 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nature Biotechnol. 21, 673–678 (2003).

    CAS  Google Scholar 

  9. Blangero, J. Localization and identification of human quantitative trait loci: King Harvest has surely come. Curr. Opin. Genet. Dev. 14, 233–240 (2004).

    CAS  PubMed  Google Scholar 

  10. Terwilliger, J. D. & Weiss, K. M. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Ann. Med. 35, 532–544 (2003).

    PubMed  Google Scholar 

  11. Wang, W. Y., Cordell, H. J. & Todd, J. A. Association mapping of complex diseases in linked regions: estimation of genetic effects and feasibility of testing rare variants. Genet. Epidemiol. 24, 36–43 (2003).

    CAS  PubMed  Google Scholar 

  12. Stefansson, H., Steinthorsdottir, V., Thorgeirsson, T. E., Gulcher, J. R. & Stefansson, K. Neuregulin 1 and schizophrenia. Ann. Med. 36, 62–71 (2004).

    CAS  PubMed  Google Scholar 

  13. Altmuller, J., Palmer, L. J., Fischer, G., Scherb, H. & Wjst, M. Genomewide scans of complex human diseases: true linkage is hard to find. Am. J. Hum. Genet. 69, 936–950 (2001). This is an analyses of 101 linkage studies. It demonstrates the difficulties in achieving significant linkage, and argues for a need for larger sample sizes.

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Neale, B. M. & Sham, P. C. The future of association studies: gene-based analysis and replication. Am. J. Hum. Genet. 75, 353–362 (2004). A review of the design of association-mapping strategies. It argues for changing the focus from SNPs to genomic regions, and outlines strategies to achieve this.

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).

    CAS  PubMed  Google Scholar 

  16. Dawson, E. et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418, 544–548 (2002).

    CAS  PubMed  Google Scholar 

  17. International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003). This paper outlines the International HapMap Project, which is currently in progress, and will provide SNP maps, LD information and tag SNPs throughout the genome for different human populations.

  18. McVean, G. A. et al. The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004).

    CAS  PubMed  Google Scholar 

  19. Johnson, G. C. et al. Haplotype tagging for the identification of common disease genes. Nature Genet. 29, 233–237 (2001). The authors introduce the concept of tag SNPs based on LD to minimize laboratory effort for SNP genotyping in association analyses.

    CAS  PubMed  Google Scholar 

  20. Reich, D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001).

    CAS  PubMed  Google Scholar 

  21. Pritchard, J. K. & Cox, N. J. The allelic architecture of human disease genes: common disease–common variant...or not? Hum. Mol. Genet. 11, 2417–2423 (2002).

    CAS  PubMed  Google Scholar 

  22. Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996). This paper showed in explicit terms the greater power of whole-genome association studies over affected sib-pair linkage for the mapping of common diseases.

    CAS  PubMed  Google Scholar 

  23. Dahlman, I. et al. Parameters for reliable results in genetic association studies in common disease. Nature Genet. 30, 149–150 (2002).

    CAS  PubMed  Google Scholar 

  24. Freimer, N. & Sabatti, C. The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology. Nature Genet. 36, 1045–1051 (2004). A clear and unbiased review of the main current genetic mapping strategies that discusses analyses using extended pedigrees, affected sib-pairs and association.

    CAS  PubMed  Google Scholar 

  25. Lowe, C. E. et al. Cost-effective analysis of candidate genes using htSNPs: a staged approach. Genes Immun. 5, 301–305 (2004).

    CAS  PubMed  Google Scholar 

  26. Smith, D. J. & Lusis, A. J. The allelic structure of common disease. Hum. Mol. Genet. 11, 2455–2461 (2002).

    CAS  PubMed  Google Scholar 

  27. Fisher, R. A. Correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).

    Google Scholar 

  28. Risch, N. The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol. Biomarkers Prev. 10, 733–741 (2001).

    CAS  PubMed  Google Scholar 

  29. Hirschhorn, J. N. et al. Genomewide linkage analysis of stature in multiple populations reveals several regions with evidence of linkage to adult height. Am. J. Hum. Genet. 69, 106–116 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. Rich, S. S. Mapping genes in diabetes. Genetic epidemiological perspective. Diabetes 39, 1315–1319 (1990).

    CAS  PubMed  Google Scholar 

  31. Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Todd, J. A. Human genetics. Tackling common disease. Nature 411, 537–539 (2001).

    CAS  PubMed  Google Scholar 

  33. Cohen, J. C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).

    CAS  PubMed  Google Scholar 

  34. Corder, E. H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921–923 (1993).

    CAS  PubMed  Google Scholar 

  35. Bell, G. I., Horita, S. & Karam, J. H. A polymorphic locus near the human insulin gene is associated with insulin-dependent Diabetes mellitus. Diabetes 33, 176–183 (1984).

    CAS  PubMed  Google Scholar 

  36. Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).

    CAS  PubMed  Google Scholar 

  37. Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599–603. (2001).

    CAS  PubMed  Google Scholar 

  38. Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603–606 (2001).

    CAS  PubMed  Google Scholar 

  39. Long, A. D. & Langley, C. H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  40. Wang, W. Y. & Pike, N. The allelic spectra of common diseases may resemble the allelic spectrum of the full genome. Med. Hypotheses 63, 748–751 (2004).

    CAS  PubMed  Google Scholar 

  41. Kruglyak, L. & Nickerson, D. A. Variation is the spice of life. Nature Genet. 27, 234–236 (2001). Using a neutral coalescence model, this article estimates the frequency distribution of SNPs in the human genome.

    CAS  PubMed  Google Scholar 

  42. Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genet. 33, 228–237 (2003).

    CAS  PubMed  Google Scholar 

  43. Clark, A. G. Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr. Opin. Genet. Dev. 13, 296–302 (2003).

    CAS  PubMed  Google Scholar 

  44. Neel, J. V. Diabetes mellitus: a 'thrifty' genotype rendered detrimental by 'progress'? Am. J. Hum. Genet. 14, 353–362 (1962).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Carlson, C. S. et al. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nature Genet. 33, 518–521 (2003).

    CAS  PubMed  Google Scholar 

  46. Nezer, C. et al. Haplotype sharing refines the location of an imprinted quantitative trait locus with major effect on muscle mass to a 250-kb chromosome segment containing the porcine IGF2 gene. Genetics 165, 227–285 (2003).

    Google Scholar 

  47. Vyse, T. J. & Todd, J. A. Genetic analysis of autoimmune disease. Cell 85, 311–318 (1996).

    CAS  PubMed  Google Scholar 

  48. Robertson, A. in Population Biology and Evolution (ed. Lewontin, R. C.) 265–280 (Syracuse Univ. Press, New York, 1967).

    Google Scholar 

  49. Paterson, A. H. et al. Mendelian factors underlying quantitative traits in tomato: comparison across species, generations, and environments. Genetics 127, 181–197 (1991).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Mackay, T. F., Lyman, R. F. & Jackson, M. S. Effects of P element insertions on quantitative traits in Drosophila melanogaster. Genetics 130, 315–332 (1992).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Hayes, B. & Goddard, M. E. The distribution of the effects of genes affecting quantitative traits in livestock. Genet. Sel. Evol. 33, 209–229 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Barton, N. H. & Keightley, P. D. Understanding quantitative genetic variation. Nature Rev. Genet. 3, 11–21 (2002).

    CAS  PubMed  Google Scholar 

  53. Wright, A., Charlesworth, B., Rudan, I., Carothers, A. & Campbell, H. A polygenic basis for late-onset disease. Trends Genet. 19, 97–106 (2003).

    CAS  PubMed  Google Scholar 

  54. Risch, N., Ghosh, S. & Todd, J. A. Statistical evaluation of multiple-locus linkage data in experimental species and its relevance to human studies: application to nonobese diabetic (NOD) mouse and human insulin-dependent Diabetes mellitus (IDDM). Am. J. Hum. Genet. 53, 702–714 (1993).

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Fisher, R. A. The Genetical Theory of Natural Selection (Oxford Univ. Press, Oxford, 1930).

    Google Scholar 

  56. Orr, H. A. The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52, 935–949 (1998).

    PubMed  Google Scholar 

  57. Pagani, F. & Baralle, F. E. Genomic variants in exons and introns: identifying the splicing spoilers. Nature Rev. Genet. 5, 389–396 (2004).

    CAS  PubMed  Google Scholar 

  58. Hoogendoorn, B. et al. Functional analysis of human promoter polymorphisms. Hum. Mol. Genet. 12, 2249–2254 (2003).

    CAS  PubMed  Google Scholar 

  59. Lo, H. S. et al. Allelic variation in gene expression is common in the human genome. Genome Res. 13, 1855–1862 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Mira, M. T. et al. Susceptibility to leprosy is associated with PARK2 and PACRG. Nature 427, 636–640 (2004).

    CAS  PubMed  Google Scholar 

  61. Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).

    CAS  PubMed  Google Scholar 

  63. Rybicki, B. A. & Elston, R. C. The relationship between the sibling recurrence-risk ratio and genotype relative risk. Am. J. Hum. Genet. 66, 593–604 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. Jorde, L. B. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444 (2000).

    CAS  PubMed  Google Scholar 

  65. Sham, P. C., Cherny, S. S., Purcell, S. & Hewitt, J. K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66, 1616–1630 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. Chapman, J. M., Cooper, J. D., Todd, J. A. & Clayton, D. G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003). This paper examines analyses of tag SNPs and suggests that it might be best to discard haplotype information and consider only the main effects of tag SNPs to avoid losing power owing to increased degrees of freedom.

    PubMed  Google Scholar 

  68. Wang, W. Y. & Todd, J. A. The usefulness of different density SNP maps for disease association studies of common variants. Hum. Mol. Genet. 12, 3145–3149 (2003). Based on sampling simulations of published, near-complete SNP maps, this study assesses the usefulness of different density SNP maps for LD mapping.

    CAS  PubMed  Google Scholar 

  69. Ke, X. et al. The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum. Mol. Genet. 13, 577–588 (2004).

    CAS  PubMed  Google Scholar 

  70. Clayton, D., Chapman, J. & Cooper, J. Use of unphased multilocus genotype data in indirect association studies. Genet. Epidemiol. 27, 415–428 (2004).

    PubMed  Google Scholar 

  71. Nejentsev, S. et al. Comparative high-resolution analysis of linkage disequilibrium and tag single nucleotide polymorphisms between populations in the vitamin D receptor gene. Hum. Mol. Genet. 13, 1633–1639 (2004).

    CAS  PubMed  Google Scholar 

  72. Jeffreys, A. J., Kauppi, L. & Neumann, R. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genet. 29, 217–222 (2001).

    CAS  PubMed  Google Scholar 

  73. Twells, R. C. et al. Haplotype structure, LD blocks, and uneven recombination within the LRP5 gene. Genome Res. 13, 845–855 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  74. Jeffreys, A. J. & May, C. A. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nature Genet. 36, 151–156 (2004).

    CAS  PubMed  Google Scholar 

  75. Wall, J. D. & Pritchard, J. K. Haplotype blocks and linkage disequilibrium in the human genome. Nature Rev. Genet. 4, 587–597 (2003).

    CAS  PubMed  Google Scholar 

  76. Pask, R. et al. Investigating the utility of combining Φ29 whole genome amplification and highly multiplexed single nucleotide polymorphism BeadArray genotyping. BMC Biotechnol. 4, 15 (2004).

    PubMed  PubMed Central  Google Scholar 

  77. Cordell, H. J. & Clayton, D. G. Genetic association studies. Lancet (in the press).

  78. Carlson, C. S. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004).

    CAS  PubMed  Google Scholar 

  79. Ke, X. et al. Efficiency and consistency of haplotype tagging of dense SNP maps in multiple samples. Hum. Mol. Genet. 13, 2557–2565 (2004).

    CAS  PubMed  Google Scholar 

  80. Bateson, W. Mendel's Principles of Heredity (Cambridge Univ. Press, Cambridge, 1909).

    Google Scholar 

  81. Thompson, W. D. Effect modification and the limits of biological inference from epidemiologic data. J. Clin. Epidemiol. 44, 221–232 (1991).

    CAS  PubMed  Google Scholar 

  82. Cordell, H. J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002).

    CAS  PubMed  Google Scholar 

  83. Culverhouse, R., Suarez, B. K., Lin, J. & Reich, T. A perspective on epistasis: limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002).

    PubMed  PubMed Central  Google Scholar 

  84. Thornton-Wells, T. A., Moore, J. H. & Haines, J. L. Genetics, statistics and human disease: analytical retooling for complexity. Trends. Genet. 20, 640–647 (2004).

    CAS  PubMed  Google Scholar 

  85. Hoh, J. & Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Rev. Genet. 4, 701–709 (2003).

    CAS  PubMed  Google Scholar 

  86. Clayton, D. & McKeigue, P. M. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 358, 1356–1360 (2001).

    CAS  PubMed  Google Scholar 

  87. Pato, C. N., Macciardi, F., Pato, M. T., Verga, M. & Kennedy, J. L. Review of the putative association of dopamine D2 receptor and alcoholism: a meta-analysis. Am. J. Med. Genet. 48, 78–82 (1993).

    CAS  PubMed  Google Scholar 

  88. Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nature Genet. 36, 388–393 (2004).

    CAS  PubMed  Google Scholar 

  89. Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nature Genet. 36, 512–517 (2004).

    CAS  PubMed  Google Scholar 

  90. Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  91. Hoggart, C. J. et al. Control of confounding of genetic associations in stratified populations. Am. J. Hum. Genet. 72, 1492–1504 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  92. Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. Reply to 'Genomic control to the extreme'. Nature Genet. 36, 1131 (2004).

    CAS  Google Scholar 

  93. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    CAS  PubMed  Google Scholar 

  94. Doll, R. & Hill, A. B. The mortality of doctors in relation to their smoking habits. BMJ 228, 1451–1455 (1954).

    Google Scholar 

  95. Doll, R. Retrospective and Prospective Studies (ed. Witts, L. J.) (Oxford Univ. Press, London, 1959).

    Google Scholar 

  96. Devlin, B. & Risch, N. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311–322 (1995).

    CAS  PubMed  Google Scholar 

  97. Lewontin, R. C. & Kojima, K. The evolutionary dynamics of complex polymorphisms. Evolution 14, 458–472 (1960).

    Google Scholar 

  98. Lewontin, R. C. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49, 49–67 (1964).

    CAS  PubMed  PubMed Central  Google Scholar 

  99. Hill, W. G. & Robertson, A. The effects of inbreeding at loci with heterozygote advantage. Genetics 60, 615–628 (1968).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. Weiss, K. M. & Clark, A. G. Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 18, 19–24 (2002).

    CAS  PubMed  Google Scholar 

  101. Thompson, D., Stram, D., Goldgar, D. & Witte, J. S. Haplotype tagging single nucleotide polymorphisms and association studies. Hum. Hered. 56, 48–55 (2003).

    PubMed  Google Scholar 

  102. Wall, J. D. & Pritchard, J. K. Assessing the performance of the haplotype block model of linkage disequilibrium. Am. J. Hum. Genet. 73, 502–515 (2003). A review on haplotype blocks and LD in the human genome.

    CAS  PubMed  PubMed Central  Google Scholar 

  103. Thomas, D. C. & Clayton, D. G. Betting odds and genetic associations. J. Natl Cancer Inst. 96, 421–423 (2004).

    PubMed  Google Scholar 

  104. Wacholder, S. et al. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004).

    PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

W.Y.S.W. received scholarships from the University of Cambridge, the University of Sydney and Gonville and Caius College, Cambridge, UK. This work was financed by the Wellcome Trust and the Juvenile Diabetes Research Foundation International.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to John A. Todd.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

DATABASES

OMIM

Type 1 diabetes

type 2 diabetes

FURTHER INFORMATION

David Clayton's tag SNP web site

International HapMap Project

NCBI Single Nucleotide Polymorphism database web site

Perlegen Sciences, Inc.

T1DBase — a genetics and bioinformatics resource for type 1 diabetes researchers

University of Washington and Fred Hutchinson Cancer Research Center Variation Discovery Resource database

Glossary

LINKAGE ANALYSIS

Mapping genes by typing genetic markers in families to identify chromosome regions that are associated with disease or trait values within pedigrees more often than are expected by chance. Such linked regions are more likely to contain a causal genetic variant.

AFFECTED SIB-PAIR (ASP) STUDIES

Linkage studies that are based on the collection of a large number of families, consisting of affected siblings, and their parents if available. In linkage analyses, the studies rely on the principle that ASPs share half their chromosomes.

LINKAGE DISEQUILIBRIUM

The non-random association of alleles of different linked polymorphisms in a population.

MINOR ALLELE FREQUENCY

(MAF). The frequency of the less common allele of a polymorphic locus. It has a value that lies between 0 and 0.5, and can vary between populations.

ODDS RATIO

A measurement of association that is commonly used in case-control studies. It is defined as the odds of exposure to the susceptible genetic variant in cases compared with that in controls. If the odds ratio is significantly greater than one, then the genetic variant is associated with the disease.

QUANTITATIVE TRAIT LOCI

Genetic loci that contribute to variations in quantitative, that is continuous, phenotypes.

PURIFYING SELECTION

Evolutionary selective forces that reduce the frequency of specific polymorphisms that have phenotypic effects.

POSITIVE SELECTION

The effect of evolutionary selective forces that favour certain variants and tend to increase their allele frequencies.

GENETIC DRIFT

Changes in allele frequencies in a population from one generation to another as the result of chance events in mating, meiosis and number of offspring.

SIBLING RELATIVE-RECURRENCE RISK

The risk of developing disease in a sibling of an affected individual relative to that of an individual in the general population. Commonly used as an indication of the heritability of a disease.

HAPLOTYPE

A set of alleles that is present on a single chromosome.

GENE CONVERSION

A non-reciprocal recombination process that results in an alteration of the sequence of a gene to that of its homologue during meiosis.

SUBGROUP ANALYSES

In genome-wide association analyses, or any other association study, there is a very low prior probability that any given locus or region is associated with disease. If the samples or data are divided into subgroups, for example, in analysis of epistatic interactions between loci — a departure from statistical independence in the joint distributions of genotypes between the loci — then the prior probability of a true positive is even lower.

POPULATION STRATIFICATION

The presence of several population subgroups that show limited interbreeding. When such subgroups differ both in allele frequency and in disease prevalence, this can lead to erroneous results in association studies.

ADMIXTURE

The mixture of two or more genetically distinct populations.

ANCESTRY INFORMATIVE MARKERS

Genetic markers that have different frequencies between populations and can be used to readily estimate the ancestral origins of a person or population.

COHORT STUDIES

Observational studies in which defined groups of people (the cohorts) are followed over time and outcomes are compared in subsets of the cohort who were exposed to different levels of factors of interest. These studies can either be performed prospectively or retrospectively from historical records.

GENOMIC CONTROL

A statistical genetics approach that provides an adjustment of the chi-squared threshold for statistical significance in a genetic association study to help allow for population sub-structure effects.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, W., Barratt, B., Clayton, D. et al. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6, 109–118 (2005). https://doi.org/10.1038/nrg1522

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1522

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing