Genome-wide association studies for common diseases and complex traits

Key Points

  • Genome-wide association studies are rapidly becoming feasible as an approach for identifying the genes that underlie common diseases and related quantitative traits. This strategy combines a comprehensive and unbiased survey of the genome with the power to detect common alleles with modest phenotypic effects.

  • Sets of markers for genome-wide association studies can be chosen using various criteria, but the degree to which a particular marker set actually surveys the genome should be evaluated if the label “genome-wide association” is to be applied. Empirical assessments of linkage disequilibrium patterns, such as those that are being performed in the HapMap project, will enable the selection of efficient sets of markers and the evaluation of the comprehensiveness of a given marker set.

  • Study design and interpretation of results must include appropriate statistical thresholds that take multiple-hypothesis testing into account, as can be achieved, for example, by permutation testing. Balancing the need for power to detect modest effects with the cost of genotyping large numbers of markers will probably require a multi-stage design.

  • False-positive results that arise due to population stratification might outnumber true associations, and population stratification should be assessed and corrected for, if needed. Alternatively, family-based designs can be used, but high-quality data are needed to avoid artifacts that are specific to these designs.

  • Gene–gene and gene–environment interactions might be common in complex traits, but unbounded searches for such interactions are unlikely to retain adequate power in studies of hundreds of thousands of markers. Either new methods will be required, or, alternatively, markers with individual effects will need to be identified first, followed by focused searches for interactions.

  • Genome-wide association studies are likely to become a reality in the near future. Care will be required in their design, performance, analysis and interpretation, and well-conceived pilot studies might be valuable for understanding and minimizing the pitfalls of this approach. Nevertheless, genome-wide association studies have the potential to identify many genes for common diseases and quantitative traits.


Genetic factors strongly affect susceptibility to common diseases and also influence disease-related quantitative traits. Identifying the relevant genes has been difficult, in part because each causal gene only makes a small contribution to overall heritability. Genetic association studies offer a potentially powerful approach for mapping causal genes with modest effects, but are limited because only a small number of genes can be studied at a time. Genome-wide association studies will soon become possible, and could open new frontiers in our understanding and treatment of disease. However, the execution and analysis of such studies will require great care.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Testing SNPs for association by direct and indirect methods.
Figure 2: Using a multistage approach to minimize sample sizes.
Figure 3: Effects of population stratification in whole-genome association studies.


  1. 1

    International human genome sequencing consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  2. 2

    Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    CAS  Article  Google Scholar 

  3. 3

    Sachidanandam, R. et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928–933 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  4. 4

    Gibbs, R. A. et al. The international HapMap project. Nature 426, 789–796 (2003). A description of the HapMap project, which will empirically determine LD patterns across the human genome, allowing the efficient selection of SNPs for genome-wide association studies.

    CAS  Google Scholar 

  5. 5

    Weiss, K. M. & Terwilliger, J. D. How many diseases does it take to map a gene with SNPs? Nature Genet. 26, 151–157 (2000).

    CAS  PubMed  Google Scholar 

  6. 6

    Blangero, J. Localization and identification of human quantitative trait loci: king harvest has surely come. Curr. Opin. Genet. Dev. 14, 233–240 (2004).

    CAS  PubMed  Google Scholar 

  7. 7

    McKeigue, P. M. Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am. J. Hum. Genet. 63, 241–251 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. 8

    Patterson, N. et al. Methods for high-density admixture mapping of disease genes. Am. J. Hum. Genet. 74, 979–1000 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. 9

    Hoggart, C. J., Shriver, M. D., Kittles, R. A., Clayton, D. G. & McKeigue, P. M. Design and analysis of admixture mapping studies. Am. J. Hum. Genet. 74, 965–978 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Zhu, X., Cooper, R. S. & Elston, R. C. Linkage analysis of a complex disease through use of admixed populations. Am. J. Hum. Genet. 74, 1136–1153 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. 11

    Jimenez-Sanchez, G., Childs, B. & Valle, D. Human disease genes. Nature 409, 853–855 (2001).

    CAS  PubMed  Google Scholar 

  12. 12

    Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  13. 13

    Reich, D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001).

    CAS  Google Scholar 

  14. 14

    Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599–603 (2001).

    CAS  Google Scholar 

  15. 15

    Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603–606 (2001).

    CAS  Google Scholar 

  16. 16

    Rioux, J. D. et al. Genetic variation in the 5q31 cytokine gene cluster confers susceptibility to Crohn disease. Nature Genet. 29, 223–228 (2001).

    CAS  Google Scholar 

  17. 17

    Stoll, M. et al. Genetic variation in DLG5 is associated with inflammatory bowel disease. Nature Genet. 36, 476–480 (2004).

    CAS  PubMed  Google Scholar 

  18. 18

    Stefansson, H. et al. Neuregulin 1 and susceptibility to schizophrenia. Am. J. Hum. Genet. 71, 877–892 (2002).

    PubMed  PubMed Central  Google Scholar 

  19. 19

    Nistico, L. et al. The CTLA-4 gene region of chromosome 2q33 is linked to, and associated with, type 1 diabetes. Hum. Mol. Genet. 5, 1075–1080 (1996).

    CAS  PubMed  Google Scholar 

  20. 20

    Altmuller, J., Palmer, L. J., Fischer, G., Scherb, H. & Wjst, M. Genomewide scans of complex human diseases: true linkage is hard to find. Am. J. Hum. Genet. 69, 936–950 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. 21

    Daly, M. J. & Rioux, J. D. New approaches to gene hunting in IBD. Inflamm. Bowel Dis. 10, 312–317 (2004).

    Google Scholar 

  22. 22

    Evans, D. M. & Cardon, L. R. Guidelines for genotyping in genomewide linkage studies: single-nucleotide-polymorphism maps versus microsatellite maps. Am. J. Hum. Genet. 75, 687–692 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Enhancing linkage analysis of complex disorders: an evaluation of high-density genotyping. Hum. Mol. Genet. 13, 1943–1949 (2004).

  24. 24

    John, S. et al. Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites. Am. J. Hum. Genet. 75, 54–64 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. 25

    Middleton, F. A. et al. Genomewide linkage analysis of bipolar disorder by use of a high-density single-nucleotide-polymorphism (SNP) genotyping assay: a comparison with microsatellite marker assays and finding of significant linkage to chromosome 6q22. Am. J. Hum. Genet. 74, 886–897 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  26. 26

    Levy, D. et al. Evidence for a gene influencing blood pressure on chromosome 17. Genome scan linkage results for longitudinal blood pressure phenotypes in subjects from the Framingham Heart Study. Hypertension 36, 477–483 (2000).

    CAS  PubMed  Google Scholar 

  27. 27

    Cox, N. J. et al. Seven regions of the genome show evidence of linkage to type 1 diabetes in a consensus analysis of 767 multiplex families. Am. J. Hum. Genet. 69, 820–830 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. 28

    Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996). A discussion of the power of association studies versus linkage studies for common alleles of modest effect, also anticipating the requirement to take multiple-hypothesis testing into account in genome-wide association studies.

    CAS  PubMed  PubMed Central  Google Scholar 

  29. 29

    Risch, N. J. Searching for genetic determinants in the new millennium. Nature 405, 847–856 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  30. 30

    Cardon, L. R. & Bell, J. I. Association study designs for complex diseases. Nature Rev. Genet. 2, 91–99 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Tabor, H. K., Risch, N. J. & Myers, R. M. Candidate-gene approaches for studying complex genetic traits: practical considerations. Nature Rev. Genet. 3, 391–397 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. 32

    Wang, W. Y. S., Barratt, B. J., Clayton, D. G. & Todd, J. A. Genome-wide association studies: theoretical and practical concerns. Nature Rev. Genet. 6, 109–118 (2005).

    CAS  Google Scholar 

  33. 33

    Harris, H. The Principle of Human Biochemical Genetics 211–242 (American Elsevier Publishing Company, New York, 1970).

    Google Scholar 

  34. 34

    Chakravarti, A. Population genetics — making sense out of sequence. Nature Genet. 21, 56–60 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. 35

    Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. 36

    Halushka, M. K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–247 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. 37

    Ioannidis, J. P., Ntzani, E. E., Trikalinos, T. A. & Contopoulos-Ioannidis, D. G. Replication validity of genetic association studies. Nature Genet. 29, 306–309 (2001).

    CAS  Google Scholar 

  38. 38

    Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003). A meta-analysis of association studies between common variants and common diseases, which indicates that a fraction (but much fewer than half) of reported associations are correct. Modest effects are the rule, indicating the need for large sample sizes.

    CAS  Google Scholar 

  39. 39

    Gloyn, A. L. et al. Large-scale association studies of variants in genes encoding the pancreatic α-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 52, 568–572 (2003).

    CAS  PubMed  Google Scholar 

  40. 40

    Florez, J. C. et al. Haplotype structure and genotype-phenotype correlations of the sulfonylurea receptor and the islet ATP-sensitive potassium channel gene region. Diabetes 53, 1360–1368 (2004).

    CAS  PubMed  Google Scholar 

  41. 41

    Altshuler, D. et al. The common PPARG Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nature Genet. 26, 76–80 (2000) This study uses large sample sizes to demonstrate a modest but consistent association between a missense polymorphism in a candidate gene and type 2 diabetes.

    CAS  PubMed  Google Scholar 

  42. 42

    Stefansson, H. et al. Association of neuregulin 1 with schizophrenia confirmed in a Scottish population. Am. J. Hum. Genet. 72, 83–87 (2003).

    CAS  PubMed  Google Scholar 

  43. 43

    Yang, J. Z. et al. Association study of neuregulin 1 gene with schizophrenia. Mol. Psychiatry 8, 706–709 (2003).

    CAS  PubMed  Google Scholar 

  44. 44

    Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003). By testing many variants in large samples, and using logistic regression, this study shows that a 3′ UTR variant is more strongly associated with autoimmune diseases than the previously studied missense variant in the same gene.

    CAS  Google Scholar 

  45. 45

    Negoro, K. et al. Analysis of the IBD5 locus and potential gene–gene interactions in Crohn's disease. Gut 52, 541–546 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Giallourakis, C. et al. IBD5 is a general risk factor for inflammatory bowel disease: replication of association with Crohn disease and identification of a novel association with ulcerative colitis. Am. J. Hum. Genet. 73, 205–211 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. 47

    Lindgren, C. & Hirschhorn, J. Genetics of type 2 diabetes. Endocrinologist 11, 178–187 (2001).

    Google Scholar 

  48. 48

    Florez, J. C., Hirschhorn, J. & Altshuler, D. The inherited basis of diabetes mellitus: implications for the genetic analysis of complex traits. Annu. Rev. Genomics Hum. Genet. 4, 257–291 (2003).

    CAS  PubMed  Google Scholar 

  49. 49

    Vaisse, C. et al. Melanocortin-4 receptor mutations are a frequent and heterogeneous cause of morbid obesity. J. Clin. Invest. 106, 253–262 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Hirschhorn, J. N. & Altshuler, D. Once and again — issues surrounding replication in genetic association studies. J. Clin. Endocrinol. Metab. 87, 4438–4441 (2002).

    CAS  PubMed  Google Scholar 

  51. 51

    Cohen, J. C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).

    CAS  PubMed  Google Scholar 

  52. 52

    Carlson, C. S., Eberle, M. A., Kruglyak, L. & Nickerson, D. A. Mapping complex disease loci in whole-genome association studies. Nature 429, 446–452 (2004). A useful and clear recent review of genome-wide association studies.

    CAS  PubMed  Google Scholar 

  53. 53

    Hirschhorn, J. N., Lohmueller, K., Byrne, E. & Hirschhorn, K. A comprehensive review of genetic association studies. Genet. Med. 4, 45–61 (2002).

    CAS  Google Scholar 

  54. 54

    Kruglyak, L. & Nickerson, D. A. Variation is the spice of life. Nature Genet. 27, 234–236 (2001).

    CAS  PubMed  Google Scholar 

  55. 55

    Syvanen, A. C. Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature Rev. Genet. 2, 930–942 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  56. 56

    Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139–144 (1999).

    CAS  Google Scholar 

  57. 57

    Jorde, L. B. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444 (2000).

    CAS  PubMed  Google Scholar 

  58. 58

    Daly, M. J., Rioux, J. D., Schaffner, S. F., Hudson, T. J. & Lander, E. S. High-resolution haplotype structure in the human genome. Nature Genet. 29, 229–232 (2001). The first description of long segments of strong LD with low haplotype diversity ('haplotype blocks').

    CAS  PubMed  PubMed Central  Google Scholar 

  59. 59

    Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001). A survey of chromosome 21 that reveals long segments of LD with low haplotype diversity.

    CAS  PubMed  PubMed Central  Google Scholar 

  60. 60

    Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002). A survey of over 50 genomic regions that reveals long segments of LD with low haplotype diversity, including relatively large samples from multiple populations.

    CAS  PubMed  PubMed Central  Google Scholar 

  61. 61

    Johnson, G. C. et al. Haplotype tagging for the identification of common disease genes. Nature Genet. 29, 233–237 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  62. 62

    Dawson, E. et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418, 544–548 (2002).

    CAS  PubMed  Google Scholar 

  63. 63

    Crawford, D. C. et al. Haplotype diversity across 100 candidate genes for inflammation, lipid metabolism, and blood pressure regulation in two populations. Am. J. Hum. Genet. 74, 610–622 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  64. 64

    Goldstein, D. B., Ahmadi, K. R., Weale, M. E. & Wood, N. W. Genome scans and candidate gene approaches in the study of common diseases and variable drug responses. Trends Genet. 19, 615–622 (2003).

    CAS  PubMed  Google Scholar 

  65. 65

    Zhang, K., Deng, M., Chen, T., Waterman, M. S. & Sun, F. A dynamic programming algorithm for haplotype block partitioning. Proc. Natl Acad. Sci. USA 99, 7335–7339 (2002).

    CAS  PubMed  Google Scholar 

  66. 66

    Stram, D. O. et al. Choosing haplotype-tagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum. Hered. 55, 27–36 (2003).

    PubMed  PubMed Central  Google Scholar 

  67. 67

    Ke, X. & Cardon, L. R. Efficient selective screening of haplotype tag SNPs. Bioinformatics 19, 287–288 (2003).

    CAS  PubMed  Google Scholar 

  68. 68

    Weale, M. E. et al. Selection and evaluation of tagging SNPs in the neuronal-sodium-channel gene SCN1A: implications for linkage-disequilibrium gene mapping. Am. J. Hum. Genet. 73, 551–565 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Carlson, C. S. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004).

    CAS  PubMed  Google Scholar 

  70. 70

    Halldorsson, B. V. et al. Optimal haplotype block-free selection of tagging SNPs for genome-wide association studies. Genome Res. 14, 1633–1640 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. 71

    Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for mendelian disease, future approaches for complex disease. Nature Genet. 33 Suppl. 228–237 (2003). A proposal to focus on missense SNPs in the search for the variants that underlie common disease.

    CAS  PubMed  PubMed Central  Google Scholar 

  72. 72

    Cambien, F. et al. Sequence diversity in 36 candidate genes for cardiovascular disorders. Am. J. Hum. Genet. 65, 183–191 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Shendure, J., Mitra, R. D., Varma, C. & Church, G. M. Advanced sequencing technologies: methods and goals. Nature Rev. Genet. 5, 335–344 (2004).

    CAS  PubMed  Google Scholar 

  74. 74

    Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. 75

    Loots, G. G. et al. Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. Science 288, 136–140 (2000). The identification of functional regulatory sequences using evolutionary conservation.

    CAS  PubMed  PubMed Central  Google Scholar 

  76. 76

    Pennacchio, L. A. & Rubin, E. M. Genomic strategies to identify mammalian regulatory sequences. Nature Rev. Genet. 2, 100–109 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  77. 77

    Thomas, J. W. et al. Comparative analyses of multi-species sequences from targeted genomic regions. Nature 424, 788–793 (2003).

    CAS  Google Scholar 

  78. 78

    Nobrega, M. A., Ovcharenko, I., Afzal, V. & Rubin, E. M. Scanning human gene deserts for long-range enhancers. Science 302, 413 (2003).

    CAS  PubMed  Google Scholar 

  79. 79

    Frazer, K. A. et al. Noncoding sequences conserved in a limited number of mammals in the SIM2 interval are frequently functional. Genome Res. 14, 367–372 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80

    Boffelli, D., Nobrega, M. A. & Rubin, E. M. Comparative genomics at the vertebrate extremes. Nature Rev. Genet. 5, 456–465 (2004).

    CAS  PubMed  Google Scholar 

  81. 81

    Buetow, K. H. et al. High-throughput development and characterization of a genomewide collection of gene-based single nucleotide polymorphism markers by chip-based matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Proc. Natl Acad. Sci. USA 98, 581–584 (2001).

    CAS  PubMed  Google Scholar 

  82. 82

    De La Vega, F. M., et al. New generation pharmacogenomic tools: a SNP linkage disequilibrium map, validated SNP assay resource, and high-throughput instrumentation system for large-scale genetic studies. Biotechniques (Suppl.), 48–50, 52, 54 (2002).

  83. 83

    Matsuzaki, H. et al. Parallel genotyping of over 10,000 SNPs using a one-primer assay on a high-density oligonucleotide array. Genome Res. 14, 414–425 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  84. 84

    van den Oord, E. J. & Sullivan, P. F. False discoveries and models for gene discovery. Trends Genet. 19, 537–542 (2003).

    CAS  PubMed  Google Scholar 

  85. 85

    Lowe, C. E. et al. Cost-effective analysis of candidate genes using htSNPs: a staged approach. Genes Immun. 5, 301–305 (2004).

    CAS  PubMed  Google Scholar 

  86. 86

    Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N. & Golani, I. Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284 (2001).

    CAS  PubMed  Google Scholar 

  87. 87

    Storey, J. D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl Acad. Sci. USA 100, 9440–9445 (2003).

    CAS  PubMed  Google Scholar 

  88. 88

    Dudbridge, F. & Koeleman, B. P. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75, 424–435 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  89. 89

    Nyholt, D. R. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74, 765–769 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. 90

    Wacholder, S., Chanock, S., Garcia-Closas, M., El Ghormli, L. & Rothman, N. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004). A Bayesian perspective on the interpretation of association studies, which emphasizes the negative impact of low prior probabilities and inadequate power on the likelihood that an association is valid.

    PubMed  Google Scholar 

  91. 91

    Barratt, B. J. et al. Remapping the insulin gene/IDDM2 locus in type 1 diabetes. Diabetes 53, 1884–1889 (2004).

    CAS  PubMed  Google Scholar 

  92. 92

    Sham, P., Bader, J. S., Craig, I., O'Donovan, M. & Owen, M. DNA Pooling: a tool for large-scale association studies. Nature Rev. Genet. 3, 862–871 (2002).

    CAS  PubMed  Google Scholar 

  93. 93

    Barratt, B. J. et al. Identification of the sources of error in allele frequency estimations from pooled DNA indicates an optimal experimental design. Ann. Hum. Genet. 66, 393–405 (2002).

    CAS  PubMed  Google Scholar 

  94. 94

    Allison, D. B. Transmission-disequilibrium tests for quantitative traits. Am. J. Hum. Genet. 60, 676–690 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  95. 95

    Rabinowitz, D. A transmission disequilibrium test for quantitative trait loci. Hum. Hered. 47, 342–350 (1997).

    CAS  PubMed  Google Scholar 

  96. 96

    Fulker, D. W., Cherny, S. S., Sham, P. C. & Hewitt, J. K. Combined linkage and association sib-pair analysis for quantitative traits. Am. J. Hum. Genet. 64, 259–267 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  97. 97

    Abecasis, G. R., Cookson, W. O. & Cardon, L. R. Pedigree tests of transmission disequilibrium. Eur. J. Hum. Genet. 8, 545–551 (2000).

    CAS  PubMed  Google Scholar 

  98. 98

    Abecasis, G. R., Cardon, L. R. & Cookson, W. O. A general test of association for quantitative traits in nuclear families. Am. J. Hum. Genet. 66, 279–292 (2000).

    CAS  PubMed  Google Scholar 

  99. 99

    Zaykin, D. V. et al. Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum. Hered. 53, 79–91 (2002).

    PubMed  Google Scholar 

  100. 100

    Schaid, D. J., Rowland, C. M., Tines, D. E., Jacobson, R. M. & Poland, G. A. Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am. J. Hum. Genet. 70, 425–434 (2002).

    PubMed  Google Scholar 

  101. 101

    Stram, D. O. et al. Modeling and E-M estimation of haplotype-specific relative risks from genotype data for a case-control study of unrelated individuals. Hum. Hered. 55, 179–190 (2003).

    PubMed  Google Scholar 

  102. 102

    Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies. 65, 220–228 (1999).

  103. 103

    Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    CAS  Google Scholar 

  104. 104

    Reich, D. E. & Goldstein, D. B. Detecting association in a case-control study while correcting for population stratification. Am. J. Hum. Genet. 20, 4–16 (2001).

    CAS  Google Scholar 

  105. 105

    Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000). Description of software for detecting and correcting for the presence of multiple population subgroups in an association study.

    CAS  PubMed  PubMed Central  Google Scholar 

  106. 106

    Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nature Genet. 36, 388–393 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  107. 107

    Morton, N. E. & Collins, A. Tests and estimates of allelic association in complex inheritance. Proc. Natl Acad. Sci. USA 95, 11389–11393 (1998).

    CAS  PubMed  Google Scholar 

  108. 108

    Wacholder, S., Rothman, N. & Caporaso, N. Counterpoint: bias from population stratification is not a major threat to the validity of conclusions from epidemiological studies of common polymorphisms and cancer. Cancer Epidemiol. Biomarkers Prev. 11, 513–520 (2002).

    PubMed  Google Scholar 

  109. 109

    Thomas, D. C. & Witte, J. S. Point: population stratification: a problem for case-control studies of candidate-gene associations? Cancer Epidemiol. Biomarkers Prev. 11, 505–512 (2002).

    PubMed  Google Scholar 

  110. 110

    Cardon, L. R. & Palmer, L. J. Population stratification and spurious allelic association. Lancet 361, 598–604 (2003).

    PubMed  Google Scholar 

  111. 111

    Ardlie, K. G., Lunetta, K. L. & Seielstad, M. Testing for population subdivision and association in four case-control studies. Am. J. Hum. Genet. 71, 304–311 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. 112

    Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nature Genet. 36, 512–517 (2004).

    CAS  PubMed  Google Scholar 

  113. 113

    Rosenberg, N. A., Li, L. M., Ward, R. & Pritchard, J. K. Informativeness of genetic markers for inference of ancestry. Am. J. Hum. Genet. 73, 1402–1422 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  114. 114

    Spielman, R. S. & Ewens, W. J. The TDT and other family-based tests for linkage disequilibrium and association. Am. J. Hum. Genet. 59, 983–989 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  115. 115

    Frayling, T. M. et al. Parent-offspring trios: a resource to facilitate the identification of type 2 diabetes genes. Diabetes 48, 2475–2479 (1999).

    CAS  PubMed  Google Scholar 

  116. 116

    Spielman, R. S. & Ewens, W. J. A sibship test for linkage in the presence of association: the sib transmission/ disequilibrium test. Am. J. Hum. Genet. 62, 450–458 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  117. 117

    Horvath, S. & Laird, N. M. A discordant-sibship test for disequilibrium and linkage: no need for parental data. Am. J. Hum. Genet. 63, 1886–1897 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  118. 118

    Boehnke, M. & Langefeld, C. D. Genetic association mapping based on discordant sib pairs: the discordant-alleles test. Am. J. Hum. Genet. 62, 950–961 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  119. 119

    Martin, E. R., Monks, S. A., Warren, L. L. & Kaplan, N. L. A test for linkage and association in general pedigrees: the pedigree disequilibrium test. Am. J. Hum. Genet. 67, 146–154 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  120. 120

    Lazzeroni, L. C. Allele sharing and allelic association I: sib pair tests with increased power. Genet. Epidemiol. 22, 328–344 (2002).

    PubMed  Google Scholar 

  121. 121

    Mitchell, A. A., Cutler, D. J. & Chakravarti, A. Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. Am. J. Hum. Genet. 72, 598–610 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  122. 122

    Gordon, D. et al. A transmission disequilibrium test for general pedigrees that is robust to the presence of random genotyping errors and any number of untyped parents. Eur. J. Hum. Genet. 12, 752–761 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  123. 123

    Cordell, H. J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002). A discussion of epistasis, including the usefulness of searching first for main effects.

    CAS  PubMed  Google Scholar 

  124. 124

    Leal, S. M. & Ott, J. Effects of stratification in the analysis of affected-sib-pair data: benefits and costs. Am. J. Hum. Genet. 66, 567–575 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  125. 125

    Cordell, H. J., Wedig, G. C., Jacobs, K. B. & Elston, R. C. Multilocus linkage tests based on affected relative pairs. Am. J. Hum. Genet. 66, 1273–1286 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  126. 126

    Cordell, H. J. et al. Statistical modeling of interlocus interactions in a complex disease: rejection of the multiplicative model of epistasis in type 1 diabetes. Genetics 158, 357–367 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. 127

    Ritchie, M. D. et al. Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am. J. Hum. Genet. 69, 138–147 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  128. 128

    Hoh, J. & Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Rev. Genet. 4, 701–709 (2003).

    CAS  PubMed  Google Scholar 

  129. 129

    Singer, J. B. et al. Genetic dissection of complex traits with chromosome substitution strains of mice. Science 304, 445–448 (2004).

    CAS  PubMed  Google Scholar 

  130. 130

    Ozaki, K. et al. Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nature Genet. 32, 650–654 (2002).

    CAS  PubMed  Google Scholar 

  131. 131

    Kamatani, N. et al. Large-scale single-nucleotide polymorphism (SNP) and haplotype analyses, using dense SNP Maps, of 199 drug-related genes in 752 subjects: the analysis of the association between uncommon SNPs within haplotype blocks and the haplotypes constructed with haplotype-tagging SNPs. Am. J. Hum. Genet. 75, 190–203 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  132. 132

    Lin, S., Chakravarti, A. & Cutler, D. J. Exhaustive allelic transmission disequilibrium tests as a new approach to genome-wide association studies. Nature Genet. 36, 1181–1188 (2004).

    CAS  PubMed  Google Scholar 

  133. 133

    Vermeire, S. et al. CARD15 genetic variation in a Quebec population: prevalence, genotype-phenotype relationship, and haplotype structure. Am. J. Hum. Genet. 71, 74–83 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  134. 134

    Kruglyak, L. Genetic isolates: separate but equal? Proc. Natl Acad. Sci. USA 96, 1170–1172 (1999).

    CAS  PubMed  Google Scholar 

  135. 135

    Shifman, S., Kuypers, J., Kokoris, M., Yakir, B. & Darvasi, A. Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet. 12, 771–776 (2003).

    CAS  PubMed  Google Scholar 

  136. 136

    Kaessmann, H. et al. Extensive linkage disequilibrium in small human populations in Eurasia. Am. J. Hum. Genet. 70, 673–685 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


We thank David Altshuler, Paul DeBakker, Chris Newton-Cheh and Nick Patterson for useful discussions. J.N.H. is the recipient of a Burroughs Wellcome Career Award in Biomedical Science and a Smith Family Foundation New Investigator Award.

Author information



Corresponding author

Correspondence to Joel N. Hirschhorn.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links


International HapMap Project

dbSNP database

The ENCODE project

Par Allele Meg Allele genotyping products

Perlegen Whole Genome Scanning

Affymetrix gene chip arrays



A genetic variant is genotyped in a population for which phenotypic information is available (such as disease occurrence, or a range of different trait values). If a correlation is observed between genotype and phenotype, there is said to be an association between the variant and the disease or trait.


A biological trait that shows continuous variation (such as height) rather than falling into distinct categories (such as diabetic or healthy). The genetic basis of these traits generally involves the effects of multiple genes and gene–environment interactions. Examples of quantitative traits that contribute to disease are body mass index, blood pressure and blood lipid levels.


A gene for which there is evidence of its possible role in the trait or disease that is under study.


Where genes are mapped by typing genetic markers in families to identify regions that are associated with disease or trait values within pedigrees more often than are expected by chance. Such linked regions are more likely to contain a causal genetic variant.


Predicting the recent ancestry of chromosomal segments across the genome to identify regions for which recent ancestry in a particular population correlates with disease or trait values. Such regions are more likely to contain causal variants that are more common in the ancestral population.


The proportion of individuals with a specific genotype who manifest the genotype at the phenotypic level. For example, if all individuals with a specific disease genotype show the disease phenotype, then the genotype is said to be 'completely penetrant'.


The proportion of the variation in a given characteristic or state that can be attributed to (additive) genetic factors.


Correlation between nearby variants such that the alleles at neighbouring markers (observed on the same chromosome) are associated within a population more often than if they were unlinked.


A sequential set of genetic markers that are present on the same chromosome.


Single nucleotide polymorphisms that are correlated with, and therefore can serve as a proxy for, much of the known remaining common variation in a region.


A consequence of collecting a nonrandom subsample with a systematic bias, so that results based on the subsample are not representative of the entire sample.


Testing more than one hypothesis within an experiment. As a result, the probability of an unusual result from within the entire experiment occurring by chance is higher than the individual p-value associated with that result.


The simplest correction of individual p-values for multiple-hypothesis testing: pcorrected = 1 − (1 − puncorrected)n, where n is the number of hypotheses tested. This formula assumes that the hypotheses are all independent, and simplifies to pcorrected = npuncorrected when npuncorrected 1.


A measure of relative risk that is usually estimated from case-control studies.


A statistical approach for assessing the likelihood that a hypothesis is correct (such as an association being valid), by assessing the strength of the data that supports the hypothesis and the number of hypotheses that are tested.


A statistical approach that assesses the probability of a hypothesis being correct (for example, whether an association is valid) by incorporating the prior probability of the hypothesis and the experimental data supporting the hypothesis.


Populations that that have been derived from a limited pool of individuals within the last 100 or fewer generations.


Combining two or more populations into a single group. This has implications for studies of genotype–disease associations if the component populations have different genotypic distributions.


A family-based association approach that uses only sibs who are phenotypically discordant (that is, different). Like the transmission disequilibrium test, this approach is immune to population stratification.


A family-based test for association that is immune to population stratification. The transmission of alleles from heterozygous parents to affected offspring is compared to the expected 1:1 ratio.


The binomial distribution of genotypes in a population, such that frequencies of genotypes AA, Aa and aa will be p2, 2pq, and q2, respectively, where p is the frequency of allele A, and q is the frequency of allele a. Hardy–Weinberg equilibrium applies in a population when there are no factors such as migration or admixture that cause deviations from p2, 2pq and q2.


In statistical genetics, this term refers to an interaction of multiple genetic variants (usually at different loci) such that the net phenotypic effect of carrying more than one variant is different than would be predicted by simply combining the effects of each individual variant (mathematically, this means that the gene–gene interaction is significant).


An approach that attempts to reduce the number of tests required to search for interactions between multiple variables.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Hirschhorn, J., Daly, M. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 6, 95–108 (2005).

Download citation

Further reading


Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing