Key Points
-
Recent improvements in genotyping technology and in our knowledge of human genetic variation have made it possible to carry out genome-wide genetic association studies to identify susceptibility genes for common disease. However, before such studies are undertaken, it is important to know what proportion of human genetic variation they are likely to survey and what the likely costs will be per true-positive result.
-
The allelic spectrum of complex diseases — that is, the range of the frequencies and effect sizes of susceptibility loci —will influence the success of genome-wide association studies. Large sample sizes will be required to search even the most accessible end of this spectrum: that is, alleles with population allele frequencies exceeding 0.1 and with odds ratios of 1.3 and above.
-
The existence of extensive regions of linkage disequilibrium in the human genome will greatly reduce the cost of genotyping in genome-wide association studies, as this allows the use of tag SNPs that provide information on a number of other SNPs that are not directly genotyped.
-
However, as much as 30% of the common variants in the genome remains unknown and the latest high-throughput technologies convert only about 50% of SNPs into robust assays. In addition, extensive resequencing and genotyping will have to be carried out to ensure that SNPs in regions of low linkage disequilibrium are surveyed comprehensively.
-
Other factors that need to be taken into account in the design and execution of initial genome-wide association studies, to avoid loss of statistical power, include possible selection bias, population substructure and misclassification errors.
Abstract
To fully understand the allelic variation that underlies common diseases, complete genome sequencing for many individuals with and without disease is required. This is still not technically feasible. However, recently it has become possible to carry out partial surveys of the genome by genotyping large numbers of common SNPs in genome-wide association studies. Here, we outline the main factors — including models of the allelic architecture of common diseases, sample size, map density and sample-collection biases — that need to be taken into account in order to optimize the cost efficiency of identifying genuine disease-susceptibility loci.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
European landrace diversity for common bean biofortification: a genome-wide association study
Scientific Reports Open Access 13 November 2020
-
Detecting PCOS susceptibility loci from genome-wide association studies via iterative trend correlation based feature screening
BMC Bioinformatics Open Access 04 May 2020
-
Clinical applications of polygenic breast cancer risk: a critical review and perspectives of an emerging field
Breast Cancer Research Open Access 17 February 2020
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout




References
Ioannidis, J. P., Trikalinos, T. A., Ntzani, E. E. & Contopoulos-Ioannidis, D. G. Genetic associations in large versus small studies: an empirical assessment. Lancet 361, 567–571 (2003).
Lohmueller, K. E., Pearce, C. L., Pike, M., Lander, E. S. & Hirschhorn, J. N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nature Genet. 33, 177–182 (2003).
Hirschhorn, J. N. & Daly, M. J. Genome-wide association studies for common diseases and complex traits. Nature Rev. Genet. 6, 95–108 (2005). A review of the issues that are involved in the design of large-scale association mapping, including marker selection and sources of false-positive and false-negative results.
Livak, K. J., Marmaro, J. & Todd, J. A. Towards fully automated genome-wide polymorphism screening. Nature Genet. 9, 341–342 (1995).
Patil, N. et al. Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001).
Syvanen, A. C. Accessing genetic variation: genotyping single nucleotide polymorphisms. Nature Rev. Genet. 2, 930–942 (2001).
Miller, R. D., Duan, S., Lovins, E. G., Kloss, E. F. & Kwok, P. Y. Efficient high-throughput resequencing of genomic DNA. Genome Res. 13, 717–720 (2003).
Hardenbol, P. et al. Multiplexed genotyping with sequence-tagged molecular inversion probes. Nature Biotechnol. 21, 673–678 (2003).
Blangero, J. Localization and identification of human quantitative trait loci: King Harvest has surely come. Curr. Opin. Genet. Dev. 14, 233–240 (2004).
Terwilliger, J. D. & Weiss, K. M. Confounding, ascertainment bias, and the blind quest for a genetic 'fountain of youth'. Ann. Med. 35, 532–544 (2003).
Wang, W. Y., Cordell, H. J. & Todd, J. A. Association mapping of complex diseases in linked regions: estimation of genetic effects and feasibility of testing rare variants. Genet. Epidemiol. 24, 36–43 (2003).
Stefansson, H., Steinthorsdottir, V., Thorgeirsson, T. E., Gulcher, J. R. & Stefansson, K. Neuregulin 1 and schizophrenia. Ann. Med. 36, 62–71 (2004).
Altmuller, J., Palmer, L. J., Fischer, G., Scherb, H. & Wjst, M. Genomewide scans of complex human diseases: true linkage is hard to find. Am. J. Hum. Genet. 69, 936–950 (2001). This is an analyses of 101 linkage studies. It demonstrates the difficulties in achieving significant linkage, and argues for a need for larger sample sizes.
Neale, B. M. & Sham, P. C. The future of association studies: gene-based analysis and replication. Am. J. Hum. Genet. 75, 353–362 (2004). A review of the design of association-mapping strategies. It argues for changing the focus from SNPs to genomic regions, and outlines strategies to achieve this.
Gabriel, S. B. et al. The structure of haplotype blocks in the human genome. Science 296, 2225–2229 (2002).
Dawson, E. et al. A first-generation linkage disequilibrium map of human chromosome 22. Nature 418, 544–548 (2002).
International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003). This paper outlines the International HapMap Project, which is currently in progress, and will provide SNP maps, LD information and tag SNPs throughout the genome for different human populations.
McVean, G. A. et al. The fine-scale structure of recombination rate variation in the human genome. Science 304, 581–584 (2004).
Johnson, G. C. et al. Haplotype tagging for the identification of common disease genes. Nature Genet. 29, 233–237 (2001). The authors introduce the concept of tag SNPs based on LD to minimize laboratory effort for SNP genotyping in association analyses.
Reich, D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001).
Pritchard, J. K. & Cox, N. J. The allelic architecture of human disease genes: common disease–common variant...or not? Hum. Mol. Genet. 11, 2417–2423 (2002).
Risch, N. & Merikangas, K. The future of genetic studies of complex human diseases. Science 273, 1516–1517 (1996). This paper showed in explicit terms the greater power of whole-genome association studies over affected sib-pair linkage for the mapping of common diseases.
Dahlman, I. et al. Parameters for reliable results in genetic association studies in common disease. Nature Genet. 30, 149–150 (2002).
Freimer, N. & Sabatti, C. The use of pedigree, sib-pair and association studies of common diseases for genetic mapping and epidemiology. Nature Genet. 36, 1045–1051 (2004). A clear and unbiased review of the main current genetic mapping strategies that discusses analyses using extended pedigrees, affected sib-pairs and association.
Lowe, C. E. et al. Cost-effective analysis of candidate genes using htSNPs: a staged approach. Genes Immun. 5, 301–305 (2004).
Smith, D. J. & Lusis, A. J. The allelic structure of common disease. Hum. Mol. Genet. 11, 2455–2461 (2002).
Fisher, R. A. Correlation between relatives on the supposition of Mendelian inheritance. Trans. R. Soc. Edinb. 52, 399–433 (1918).
Risch, N. The genetic epidemiology of cancer: interpreting family and twin studies and their implications for molecular genetic approaches. Cancer Epidemiol. Biomarkers Prev. 10, 733–741 (2001).
Hirschhorn, J. N. et al. Genomewide linkage analysis of stature in multiple populations reveals several regions with evidence of linkage to adult height. Am. J. Hum. Genet. 69, 106–116 (2001).
Rich, S. S. Mapping genes in diabetes. Genetic epidemiological perspective. Diabetes 39, 1315–1319 (1990).
Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).
Todd, J. A. Human genetics. Tackling common disease. Nature 411, 537–539 (2001).
Cohen, J. C. et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004).
Corder, E. H. et al. Gene dose of apolipoprotein E type 4 allele and the risk of Alzheimer's disease in late onset families. Science 261, 921–923 (1993).
Bell, G. I., Horita, S. & Karam, J. H. A polymorphic locus near the human insulin gene is associated with insulin-dependent Diabetes mellitus. Diabetes 33, 176–183 (1984).
Ueda, H. et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to autoimmune disease. Nature 423, 506–511 (2003).
Hugot, J. P. et al. Association of NOD2 leucine-rich repeat variants with susceptibility to Crohn's disease. Nature 411, 599–603. (2001).
Ogura, Y. et al. A frameshift mutation in NOD2 associated with susceptibility to Crohn's disease. Nature 411, 603–606 (2001).
Long, A. D. & Langley, C. H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).
Wang, W. Y. & Pike, N. The allelic spectra of common diseases may resemble the allelic spectrum of the full genome. Med. Hypotheses 63, 748–751 (2004).
Kruglyak, L. & Nickerson, D. A. Variation is the spice of life. Nature Genet. 27, 234–236 (2001). Using a neutral coalescence model, this article estimates the frequency distribution of SNPs in the human genome.
Botstein, D. & Risch, N. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genet. 33, 228–237 (2003).
Clark, A. G. Finding genes underlying risk of complex disease by linkage disequilibrium mapping. Curr. Opin. Genet. Dev. 13, 296–302 (2003).
Neel, J. V. Diabetes mellitus: a 'thrifty' genotype rendered detrimental by 'progress'? Am. J. Hum. Genet. 14, 353–362 (1962).
Carlson, C. S. et al. Additional SNPs and linkage-disequilibrium analyses are necessary for whole-genome association studies in humans. Nature Genet. 33, 518–521 (2003).
Nezer, C. et al. Haplotype sharing refines the location of an imprinted quantitative trait locus with major effect on muscle mass to a 250-kb chromosome segment containing the porcine IGF2 gene. Genetics 165, 227–285 (2003).
Vyse, T. J. & Todd, J. A. Genetic analysis of autoimmune disease. Cell 85, 311–318 (1996).
Robertson, A. in Population Biology and Evolution (ed. Lewontin, R. C.) 265–280 (Syracuse Univ. Press, New York, 1967).
Paterson, A. H. et al. Mendelian factors underlying quantitative traits in tomato: comparison across species, generations, and environments. Genetics 127, 181–197 (1991).
Mackay, T. F., Lyman, R. F. & Jackson, M. S. Effects of P element insertions on quantitative traits in Drosophila melanogaster. Genetics 130, 315–332 (1992).
Hayes, B. & Goddard, M. E. The distribution of the effects of genes affecting quantitative traits in livestock. Genet. Sel. Evol. 33, 209–229 (2001).
Barton, N. H. & Keightley, P. D. Understanding quantitative genetic variation. Nature Rev. Genet. 3, 11–21 (2002).
Wright, A., Charlesworth, B., Rudan, I., Carothers, A. & Campbell, H. A polygenic basis for late-onset disease. Trends Genet. 19, 97–106 (2003).
Risch, N., Ghosh, S. & Todd, J. A. Statistical evaluation of multiple-locus linkage data in experimental species and its relevance to human studies: application to nonobese diabetic (NOD) mouse and human insulin-dependent Diabetes mellitus (IDDM). Am. J. Hum. Genet. 53, 702–714 (1993).
Fisher, R. A. The Genetical Theory of Natural Selection (Oxford Univ. Press, Oxford, 1930).
Orr, H. A. The population genetics of adaptation: the distribution of factors fixed during adaptive evolution. Evolution 52, 935–949 (1998).
Pagani, F. & Baralle, F. E. Genomic variants in exons and introns: identifying the splicing spoilers. Nature Rev. Genet. 5, 389–396 (2004).
Hoogendoorn, B. et al. Functional analysis of human promoter polymorphisms. Hum. Mol. Genet. 12, 2249–2254 (2003).
Lo, H. S. et al. Allelic variation in gene expression is common in the human genome. Genome Res. 13, 1855–1862 (2003).
Mira, M. T. et al. Susceptibility to leprosy is associated with PARK2 and PACRG. Nature 427, 636–640 (2004).
Morley, M. et al. Genetic analysis of genome-wide variation in human gene expression. Nature 430, 743–747 (2004).
Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).
Rybicki, B. A. & Elston, R. C. The relationship between the sibling recurrence-risk ratio and genotype relative risk. Am. J. Hum. Genet. 66, 593–604 (2000).
Jorde, L. B. Linkage disequilibrium and the search for complex disease genes. Genome Res. 10, 1435–1444 (2000).
Sham, P. C., Cherny, S. S., Purcell, S. & Hewitt, J. K. Power of linkage versus association analysis of quantitative traits, by use of variance-components models, for sibship data. Am. J. Hum. Genet. 66, 1616–1630 (2000).
Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).
Chapman, J. M., Cooper, J. D., Todd, J. A. & Clayton, D. G. Detecting disease associations due to linkage disequilibrium using haplotype tags: a class of tests and the determinants of statistical power. Hum. Hered. 56, 18–31 (2003). This paper examines analyses of tag SNPs and suggests that it might be best to discard haplotype information and consider only the main effects of tag SNPs to avoid losing power owing to increased degrees of freedom.
Wang, W. Y. & Todd, J. A. The usefulness of different density SNP maps for disease association studies of common variants. Hum. Mol. Genet. 12, 3145–3149 (2003). Based on sampling simulations of published, near-complete SNP maps, this study assesses the usefulness of different density SNP maps for LD mapping.
Ke, X. et al. The impact of SNP density on fine-scale patterns of linkage disequilibrium. Hum. Mol. Genet. 13, 577–588 (2004).
Clayton, D., Chapman, J. & Cooper, J. Use of unphased multilocus genotype data in indirect association studies. Genet. Epidemiol. 27, 415–428 (2004).
Nejentsev, S. et al. Comparative high-resolution analysis of linkage disequilibrium and tag single nucleotide polymorphisms between populations in the vitamin D receptor gene. Hum. Mol. Genet. 13, 1633–1639 (2004).
Jeffreys, A. J., Kauppi, L. & Neumann, R. Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nature Genet. 29, 217–222 (2001).
Twells, R. C. et al. Haplotype structure, LD blocks, and uneven recombination within the LRP5 gene. Genome Res. 13, 845–855 (2003).
Jeffreys, A. J. & May, C. A. Intense and highly localized gene conversion activity in human meiotic crossover hot spots. Nature Genet. 36, 151–156 (2004).
Wall, J. D. & Pritchard, J. K. Haplotype blocks and linkage disequilibrium in the human genome. Nature Rev. Genet. 4, 587–597 (2003).
Pask, R. et al. Investigating the utility of combining Φ29 whole genome amplification and highly multiplexed single nucleotide polymorphism BeadArray genotyping. BMC Biotechnol. 4, 15 (2004).
Cordell, H. J. & Clayton, D. G. Genetic association studies. Lancet (in the press).
Carlson, C. S. et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004).
Ke, X. et al. Efficiency and consistency of haplotype tagging of dense SNP maps in multiple samples. Hum. Mol. Genet. 13, 2557–2565 (2004).
Bateson, W. Mendel's Principles of Heredity (Cambridge Univ. Press, Cambridge, 1909).
Thompson, W. D. Effect modification and the limits of biological inference from epidemiologic data. J. Clin. Epidemiol. 44, 221–232 (1991).
Cordell, H. J. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum. Mol. Genet. 11, 2463–2468 (2002).
Culverhouse, R., Suarez, B. K., Lin, J. & Reich, T. A perspective on epistasis: limits of models displaying no main effect. Am. J. Hum. Genet. 70, 461–471 (2002).
Thornton-Wells, T. A., Moore, J. H. & Haines, J. L. Genetics, statistics and human disease: analytical retooling for complexity. Trends. Genet. 20, 640–647 (2004).
Hoh, J. & Ott, J. Mathematical multi-locus approaches to localizing complex human trait genes. Nature Rev. Genet. 4, 701–709 (2003).
Clayton, D. & McKeigue, P. M. Epidemiological methods for studying genes and environmental factors in complex diseases. Lancet 358, 1356–1360 (2001).
Pato, C. N., Macciardi, F., Pato, M. T., Verga, M. & Kennedy, J. L. Review of the putative association of dopamine D2 receptor and alcoholism: a meta-analysis. Am. J. Med. Genet. 48, 78–82 (1993).
Freedman, M. L. et al. Assessing the impact of population stratification on genetic association studies. Nature Genet. 36, 388–393 (2004).
Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nature Genet. 36, 512–517 (2004).
Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).
Hoggart, C. J. et al. Control of confounding of genetic associations in stratified populations. Am. J. Hum. Genet. 72, 1492–1504 (2003).
Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. Reply to 'Genomic control to the extreme'. Nature Genet. 36, 1131 (2004).
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).
Doll, R. & Hill, A. B. The mortality of doctors in relation to their smoking habits. BMJ 228, 1451–1455 (1954).
Doll, R. Retrospective and Prospective Studies (ed. Witts, L. J.) (Oxford Univ. Press, London, 1959).
Devlin, B. & Risch, N. A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311–322 (1995).
Lewontin, R. C. & Kojima, K. The evolutionary dynamics of complex polymorphisms. Evolution 14, 458–472 (1960).
Lewontin, R. C. The interaction of selection and linkage. I. General considerations; heterotic models. Genetics 49, 49–67 (1964).
Hill, W. G. & Robertson, A. The effects of inbreeding at loci with heterozygote advantage. Genetics 60, 615–628 (1968).
Weiss, K. M. & Clark, A. G. Linkage disequilibrium and the mapping of complex human traits. Trends Genet. 18, 19–24 (2002).
Thompson, D., Stram, D., Goldgar, D. & Witte, J. S. Haplotype tagging single nucleotide polymorphisms and association studies. Hum. Hered. 56, 48–55 (2003).
Wall, J. D. & Pritchard, J. K. Assessing the performance of the haplotype block model of linkage disequilibrium. Am. J. Hum. Genet. 73, 502–515 (2003). A review on haplotype blocks and LD in the human genome.
Thomas, D. C. & Clayton, D. G. Betting odds and genetic associations. J. Natl Cancer Inst. 96, 421–423 (2004).
Wacholder, S. et al. Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J. Natl Cancer Inst. 96, 434–442 (2004).
Acknowledgements
W.Y.S.W. received scholarships from the University of Cambridge, the University of Sydney and Gonville and Caius College, Cambridge, UK. This work was financed by the Wellcome Trust and the Juvenile Diabetes Research Foundation International.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
DATABASES
OMIM
FURTHER INFORMATION
David Clayton's tag SNP web site
NCBI Single Nucleotide Polymorphism database web site
T1DBase — a genetics and bioinformatics resource for type 1 diabetes researchers
Glossary
- LINKAGE ANALYSIS
-
Mapping genes by typing genetic markers in families to identify chromosome regions that are associated with disease or trait values within pedigrees more often than are expected by chance. Such linked regions are more likely to contain a causal genetic variant.
- AFFECTED SIB-PAIR (ASP) STUDIES
-
Linkage studies that are based on the collection of a large number of families, consisting of affected siblings, and their parents if available. In linkage analyses, the studies rely on the principle that ASPs share half their chromosomes.
- LINKAGE DISEQUILIBRIUM
-
The non-random association of alleles of different linked polymorphisms in a population.
- MINOR ALLELE FREQUENCY
-
(MAF). The frequency of the less common allele of a polymorphic locus. It has a value that lies between 0 and 0.5, and can vary between populations.
- ODDS RATIO
-
A measurement of association that is commonly used in case-control studies. It is defined as the odds of exposure to the susceptible genetic variant in cases compared with that in controls. If the odds ratio is significantly greater than one, then the genetic variant is associated with the disease.
- QUANTITATIVE TRAIT LOCI
-
Genetic loci that contribute to variations in quantitative, that is continuous, phenotypes.
- PURIFYING SELECTION
-
Evolutionary selective forces that reduce the frequency of specific polymorphisms that have phenotypic effects.
- POSITIVE SELECTION
-
The effect of evolutionary selective forces that favour certain variants and tend to increase their allele frequencies.
- GENETIC DRIFT
-
Changes in allele frequencies in a population from one generation to another as the result of chance events in mating, meiosis and number of offspring.
- SIBLING RELATIVE-RECURRENCE RISK
-
The risk of developing disease in a sibling of an affected individual relative to that of an individual in the general population. Commonly used as an indication of the heritability of a disease.
- HAPLOTYPE
-
A set of alleles that is present on a single chromosome.
- GENE CONVERSION
-
A non-reciprocal recombination process that results in an alteration of the sequence of a gene to that of its homologue during meiosis.
- SUBGROUP ANALYSES
-
In genome-wide association analyses, or any other association study, there is a very low prior probability that any given locus or region is associated with disease. If the samples or data are divided into subgroups, for example, in analysis of epistatic interactions between loci — a departure from statistical independence in the joint distributions of genotypes between the loci — then the prior probability of a true positive is even lower.
- POPULATION STRATIFICATION
-
The presence of several population subgroups that show limited interbreeding. When such subgroups differ both in allele frequency and in disease prevalence, this can lead to erroneous results in association studies.
- ADMIXTURE
-
The mixture of two or more genetically distinct populations.
- ANCESTRY INFORMATIVE MARKERS
-
Genetic markers that have different frequencies between populations and can be used to readily estimate the ancestral origins of a person or population.
- COHORT STUDIES
-
Observational studies in which defined groups of people (the cohorts) are followed over time and outcomes are compared in subsets of the cohort who were exposed to different levels of factors of interest. These studies can either be performed prospectively or retrospectively from historical records.
- GENOMIC CONTROL
-
A statistical genetics approach that provides an adjustment of the chi-squared threshold for statistical significance in a genetic association study to help allow for population sub-structure effects.
Rights and permissions
About this article
Cite this article
Wang, W., Barratt, B., Clayton, D. et al. Genome-wide association studies: theoretical and practical concerns. Nat Rev Genet 6, 109–118 (2005). https://doi.org/10.1038/nrg1522
Issue Date:
DOI: https://doi.org/10.1038/nrg1522
This article is cited by
-
Association mapping and genomic selection for sorghum adaptation to tropical soils of Brazil in a sorghum multiparental random mating population
Theoretical and Applied Genetics (2021)
-
Major locus for spontaneous haploid genome doubling detected by a case–control GWAS in exotic maize germplasm
Theoretical and Applied Genetics (2021)
-
Clinical applications of polygenic breast cancer risk: a critical review and perspectives of an emerging field
Breast Cancer Research (2020)
-
Detecting PCOS susceptibility loci from genome-wide association studies via iterative trend correlation based feature screening
BMC Bioinformatics (2020)
-
European landrace diversity for common bean biofortification: a genome-wide association study
Scientific Reports (2020)