Review Article | Published:

Human genetic variation and its contribution to complex traits

Nature Reviews Genetics volume 10, pages 241251 (2009) | Download Citation



The last few years have seen extensive efforts to catalogue human genetic variation and correlate it with phenotypic differences. Most common SNPs have now been assessed in genome-wide studies for statistical associations with many complex traits, including many important common diseases. Although these studies have provided new biological insights, only a limited amount of the heritable component of any complex trait has been identified and it remains a challenge to elucidate the functional link between associated variants and phenotypic traits. Technological advances, such as the ability to detect rare and structural variants, and a clear understanding of the challenges in linking different types of variation with phenotype, will be essential for future progress.

Key points

  • Human genetic variants are typically referred to as either common or rare, to denote the frequency of the minor allele in the human population. Genetic variants can also be divided into two different nucleotide composition classes — single nucleotide variants and structural variants.

  • The alleles of SNPs located in the same genomic interval are often correlated with one another. This correlation structure, or linkage disequilibrium (LD), varies in a complex and unpredictable manner across the genome and between different populations.

  • Structural variants seem to behave similarly to SNPs in terms of both genomic and population distribution, indicating a similar evolutionary history: both types of variants are 'ancestral' having arisen once in human history and then shared among individuals by descent rather than being the result of recurrent mutations.

  • Full sequencing of human genomes has shown that in any given individual there are, on average, 4 million genetic variants encompassing 12 Mb of sequence. The challenge is to determine which of these variants underlie or are responsible for the inherited components of phenotypes.

  • Over the last decade or so the human genetics field has debated the common disease–common variant hypothesis, which posits that common complex traits are largely due to common variants with small-to-modest affect sizes. The opposing theory, the rare variant hypothesis, posits that common complex traits are the summation of low-frequency, high-penetrance variants.

  • Genome-wide association (GWA) studies are the most widely used contemporary approach to relate genetic variation to phenotypic diversity. Over the past 2 years these studies have identified statistical association between hundreds of loci across the genome and common complex traits.

  • Most of the genes or genomic loci that have been identified by GWA studies have not previously been known to be related to the complex trait under investigation. Surprisingly, there have been several instances in which one genomic interval has been associated with two or more seemingly distinct diseases.

  • An unforeseen limitation of GWA studies is that the genomic markers that are found to be associated with any given complex trait each have less impact on susceptibility than was anticipated. Most of the odds ratios for the heterozygote genotypes of the associated variants that have been identified so far are approximately 1.1, a figure that can increase to 1.5–1.6 for homozygote genotypes.

  • At this point, there are almost no complex traits for which more than 10% of the genetic variance is explained, and many are far below that threshold, leaving the bulk of heritability unexplained by the common variants identified so far.

  • One possibility is that the missing variation is accounted for by common genetic variants with small effect sizes that have not yet been identified. Some of the missing heritability is probably accounted for by rare and novel variants. Additionally, there are statistical limitations of the GWA approach in identifying gene–gene and gene–environment interactions, which are likely to be profoundly important.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

  2. 2.

    et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

  3. 3.

    et al. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  4. 4.

    et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007). Publication of the HapMap Phase II results genotyping over 3.1 million SNPs in 270 individuals from four geographically diverse populations.

  5. 5.

    et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).

  6. 6.

    et al. Large-scale copy number polymorphism in the human genome. Science 305, 525–528 (2004).

  7. 7.

    et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004).

  8. 8.

    et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

  9. 9.

    et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008). Demonstrates the prevalence and importance of structural variation in the human genome, which historically had not been given much attention.

  10. 10.

    et al. Completing the map of human genetic variation. Nature 447, 161–165 (2007).

  11. 11.

    et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007). The first publication of a genome sequence of a single individual (J. Craig Venter).

  12. 12.

    et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 452, 872–876 (2008). The first paper to demonstrate how technological advances will enable the rapid sequencing of individual human genomes in the near future. Interestingly, the individual sequenced here is Jim Watson, who won the nobel prize for discovery of the DNA double helix.

  13. 13.

    et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

  14. 14.

    et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

  15. 15.

    et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008). A useful review of appropriate study design, analysis, and interpretation of human GWA studies.

  16. 16.

    , & Structural variation in the human genome. Nature Rev. Genet. 7, 85–97 (2006).

  17. 17.

    & The population genetics of structural variation. Nature Genet. 39, S30–S36 (2007).

  18. 18.

    , & Genetic mapping in human disease. Science 322, 881–888 (2008). A good recent review of the results of human GWA studies. Interestingly, the authors compare sample size requirements for genetic association studies of common and rare variants.

  19. 19.

    Progress and challenges in genome-wide association studies in humans. Nature 456, 728–731 (2008).

  20. 20.

    The road to genome-wide association studies. Nature Rev. Genet. 9, 314–318 (2008).

  21. 21.

    Evolutionary rate at the molecular level. Nature 217, 624–626 (1968).

  22. 22.

    Near-neutrality in evolution of genes and gene regulation. Proc. Natl Acad. Sci. USA 99, 16134–16137 (2002).

  23. 23.

    & Variation is the spice of life. Nature Genet. 27, 234–236 (2001).

  24. 24.

    Linkage disequilibrium — understanding the evolutionary past and mapping the medical future. Nature Rev. Genet. 9, 477–485 (2008).

  25. 25.

    & Evaluating coverage of genome-wide association studies. Nature Genet. 38, 659–662 (2006).

  26. 26.

    et al. Power to detect risk alleles using genome-wide tag SNP panels. PLoS Genet. 3, e170 (2007).

  27. 27.

    et al. Evaluating and improving power in whole-genome association studies using fixed marker sets. Nature Genet. 38, 663–667 (2006).

  28. 28.

    & Conjuring SNPs to detect associations. Nature Genet. 39, 815–816 (2007).

  29. 29.

    et al. Fine-scale structural variation of the human genome. Nature Genet. 37, 727–732 (2005).

  30. 30.

    , , , & Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nature Genet. 40, 1199–1203 (2008).

  31. 31.

    & Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

  32. 32.

    et al. Genome assembly comparison identifies structural variants in the human genome. Nature Genet. 38, 1413–1418 (2006).

  33. 33.

    , , , & A high-resolution survey of deletion polymorphism in the human genome. Nature Genet. 38, 75–81 (2006).

  34. 34.

    et al. A robust statistical method for case-control association testing with copy number variation. Nature Genet. 40, 1245–1252 (2008).

  35. 35.

    & Copy-number variation and association studies of human disease. Nature Genet. 39, S37–S42 (2007).

  36. 36.

    Major changes in our DNA lead to major changes in our thinking. Nature Genet. 39, S3–S5 (2007).

  37. 37.

    et al. Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genet. 40, 1253–1260 (2008).

  38. 38.

    , & Mutational and selective effects on copy-number variants in the human genome. Nature Genet. 39, S22–S29 (2007).

  39. 39.

    , , , & Common deletions and SNPs are in linkage disequilibrium in the human genome. Nature Genet. 38, 9–11 (2006).

  40. 40.

    et al. Common deletion polymorphisms in the human genome. Nature Genet. 38, 86–92 (2006).

  41. 41.

    et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nature Genet. 40, 1166–1174 (2008). Demonstrates that common structural variants are in LD with common SNPs in the human genome.

  42. 42.

    et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008).

  43. 43.

    et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).

  44. 44.

    et al. Linkage disequilibrium and heritability of copy-number polymorphisms within duplicated regions of the human genome. Am. J. Hum. Genet. 79, 275–290 (2006).

  45. 45.

    & The allelic architecture of human disease genes: common disease–common variant or not? Hum. Mol. Genet. 11, 2417–2423 (2002).

  46. 46.

    Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124–137 (2001).

  47. 47.

    & On the allelic spectrum of human disease. Trends Genet. 17, 502–510 (2001).

  48. 48.

    The new genomics: global views of biology. Science 274, 536–539 (1996).

  49. 49.

    Population genetics — making sense out of sequence. Nature Genet. 21, 56–60 (1999).

  50. 50.

    , & Rare variant hypothesis for multifactorial inheritance: susceptibility to colorectal adenomas as a model. Cell Cycle 4, 521–525 (2005).

  51. 51.

    & Common and rare variants in multifactorial susceptibility to common diseases. Nature Genet. 40, 695–701 (2008). The authors discuss the concepts behind the common disease common–variant hypothesis and contrast them to the basic ideas that underlie the rare variant hypothesis.

  52. 52.

    & How to interpret a genome-wide association study. JAMA 299, 1335–1344 (2008).

  53. 53.

    What can genome-wide association studies tell us about the genetics of common disease? PLoS Genet. 4, e33 (2008).

  54. 54.

    & Guilt beyond a reasonable doubt. Nature Genet. 39, 813–815 (2007).

  55. 55.

    , , and A Catalog of Published Genome-Wide Association Studies. National Human Genome Research Institute [], (accessed 1 Jan 2009).

  56. 56.

    & Genome-wide association studies: a new window into immune-mediated diseases. Nature Rev. Immunol. 8, 631–643 (2008).

  57. 57.

    Genome-wide association studies provide new insights into type 2 diabetes aetiology. Nature Rev. Genet. 8, 657–662 (2007).

  58. 58.

    et al. Common variant in MTNR1B associated with increased risk of type 2 diabetes and impaired early insulin secretion. Nature Genet. 41, 82–88 (2009).

  59. 59.

    et al. A variant near MTNR1B is associated with increased fasting plasma glucose levels and type 2 diabetes risk. Nature Genet. 41, 89–94 (2009).

  60. 60.

    et al. Genetic variation in the KIF1B locus influences susceptibility to multiple sclerosis. Nature Genet. 40, 1402–1403 (2008).

  61. 61.

    et al. Risk alleles for multiple sclerosis identified by a genomewide study. N. Engl. J. Med. 357, 851–862 (2007).

  62. 62.

    et al. The human disease network. Proc. Natl Acad. Sci. USA 104, 8685–8690 (2007).

  63. 63.

    & Autoimmune diseases: insights from genome-wide association studies. Hum. Mol. Genet. 17, R116–R121 (2008).

  64. 64.

    et al. A common allele on chromosome 9 associated with coronary heart disease. Science 316, 1488–1491 (2007).

  65. 65.

    et al. Genomewide association analysis of coronary artery disease. N. Engl. J. Med. 357, 443–453 (2007).

  66. 66.

    et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491–1493 (2007).

  67. 67.

    et al. The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nature Genet. 40, 217–224 (2008).

  68. 68.

    et al. A common variant associated with prostate cancer in European and African populations. Nature Genet. 38, 652–658 (2006).

  69. 69.

    et al. Admixture mapping identifies 8q24 as a prostate cancer risk locus in African-American men. Proc. Natl Acad. Sci. USA 103, 14068–14073 (2006).

  70. 70.

    et al. Mapping complex disease traits with global gene expression. Nature Rev. Genet. 10, 184–194 (2009).

  71. 71.

    , , , & Worldwide population differentiation at disease-associated SNPs. BMC Med. Genomics 1, 22 (2008).

  72. 72.

    et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445, 881–885 (2007).

  73. 73.

    et al. Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316, 1331–1336 (2007).

  74. 74.

    et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316, 1341–1345 (2007).

  75. 75.

    et al. Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316, 1336–1341 (2007).

  76. 76.

    et al. HapMap tagSNP transferability in multiple populations: general guidelines. Genomics 92, 41–51 (2008).

  77. 77.

    et al. Replicating genotype–phenotype associations. Nature 447, 655–660 (2007).

  78. 78.

    et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).

  79. 79.

    et al. Recurrent 16p11.2 microdeletions in autism. Hum. Mol. Genet. 17, 628–638 (2008).

  80. 80.

    et al. Structural variation of chromosomes in autism spectrum disorder. Am. J. Hum. Genet. 82, 477–488 (2008).

  81. 81.

    et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008).

  82. 82.

    Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455, 237–241 (2008).

  83. 83.

    et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008).

  84. 84.

    et al. Male-pattern baldness susceptibility locus at 20p11. Nature Genet. 40, 1282–1284 (2008).

  85. 85.

    et al. SLCO1B1 variants and statin-induced myopathy — a genomewide study. N. Engl. J. Med. 359, 789–799 (2008).

  86. 86.

    et al. Genetic variants near TNFAIP3 on 6q23 are associated with systemic lupus erythematosus. Nature Genet. 40, 1059–1061 (2008).

  87. 87.

    et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genet. 40, 955–962 (2008). One of the human traits for which a large number of loci has been identified; the majority have modest effect sizes and in sum explain only a minority of the overall heritability.

  88. 88.

    et al. Two newly identified genetic determinants of pigmentation in Europeans. Nature Genet. 40, 835–837 (2008).

  89. 89.

    et al. Genetic determinants of hair, eye and skin pigmentation in Europeans. Nature Genet. 39, 1443–1452 (2007).

  90. 90.

    et al. A genomewide association study of skin pigmentation in a South Asian population. Am. J. Hum. Genet. 81, 1119–1132 (2007).

  91. 91.

    et al. A genome-wide association scan identifies the hepatic cholesterol transporter ABCG8 as a susceptibility factor for human gallstone disease. Nature Genet. 39, 995–999 (2007).

  92. 92.

    et al. Complement factor H polymorphism in age-related macular degeneration. Science 308, 385–389 (2005).

  93. 93.

    et al. Common sequence variants in the LOXL1 gene confer susceptibility to exfoliation glaucoma. Science 317, 1397–1400 (2007).

  94. 94.

    et al. A common variant of HMGA2 is associated with adult and childhood height in the general population. Nature Genet. 39, 1245–1250 (2007).

  95. 95.

    et al. Common variants in the GDF5-UQCC region are associated with variation in human height. Nature Genet. 40, 198–203 (2008).

  96. 96.

    et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nature Genet. 40, 575–583 (2008).

  97. 97.

    et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nature Genet. 40, 584–591 (2008).

  98. 98.

    et al. Many sequence variants affecting diversity of adult human height. Nature Genet. 40, 609–615 (2008).

  99. 99.

    & Reaching new heights: insights into the genetics of human stature. Trends Genet. 24, 595–603 (2008).

  100. 100.

    et al. Loci influencing lipid levels and coronary heart disease risk in 16 European population cohorts. Nature Genet. 41, 47–55 (2008).

  101. 101.

    et al. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nature Genet. 40, 161–169 (2008).

  102. 102.

    et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nature Genet. 41, 56–65 (2009).

  103. 103.

    et al. Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas. Proc. Natl Acad. Sci. USA 101, 15992–15997 (2004).

  104. 104.

    et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science 305, 869–872 (2004). One of the first studies to demonstrate that multiple rare alleles with high penetrance collectively contribute to a common phenotype in the general population.

  105. 105.

    et al. A spectrum of PCSK9 alleles contributes to plasma levels of low-density lipoprotein cholesterol. Am. J. Hum. Genet. 78, 410–422 (2006).

  106. 106.

    et al. Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nature Genet. 39, 513–516 (2007).

  107. 107.

    et al. The prevalence of folate-remedial MTHFR enzyme variants in humans. Proc. Natl Acad. Sci. USA 105, 8055–8060 (2008).

  108. 108.

    et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). The goal of this project was to develop efficient methods for functionally annotating human genomic sequences. The work yielded new understandings of transcription regulatory sequences and their relationships with features of chromatin accessibility and histone modification.

  109. 109.

    & Genetic variation in laboratory mice. Nature Genet. 37, 1175–1180 (2005).

  110. 110.

    Mouse models of human genetic disease: which mouse is more like a man? Bioessays 18, 993–998 (1996).

  111. 111.

    The influence of genetic background on spontaneous and genetically engineered mouse models of complex diseases. Lab. Anim. (NY) 30, 34–39 (2001).

  112. 112.

    Taking stock of complex trait genetics in mice. Trends Genet. 11, 471–477 (1995).

  113. 113.

    et al. Genetic architecture of complex traits: large phenotypic effects and pervasive epistasis. Proc. Natl Acad. Sci. USA 105, 19910–19914 (2008).

  114. 114.

    et al. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nature Genet. 38, 1055–1059 (2006).

  115. 115.

    et al. CFH haplotypes without the Y402H coding variant show strong association with susceptibility to age-related macular degeneration. Nature Genet. 38, 1049–1054 (2006).

  116. 116.

    et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nature Genet. 39, 596–604 (2007).

  117. 117.

    et al. Genetic variation in an individual human exome. PLoS Genet. 4, e1000160 (2008).

  118. 118.

    et al. Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am. J. Hum. Genet. 74, 106–120 (2004).

  119. 119.

    et al. A null mutation in human APOC3 confers a favorable plasma lipid profile and apparent cardioprotection. Science 322, 1702–1705 (2008).

  120. 120.

    et al. Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nature Genet. 41, 35–46 (2008).

  121. 121.

    et al. Challenges and standards in integrating surveys of structural variation. Nature Genet. 39, S7–S15 (2007).

  122. 122.

    et al. A common inversion under selection in Europeans. Nature Genet. 37, 129–137 (2005).

  123. 123.

    et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nature Genet. 40, 638–645 (2008).

  124. 124.

    et al. A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nature Genet. 39, 770–775 (2007).

  125. 125.

    Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  126. 126.

    et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316, 889–894 (2007).

  127. 127.

    et al. The obesity-associated FTO gene encodes a 2-oxoglutarate-dependent nucleic acid demethylase. Science 318, 1469–1472 (2007).

  128. 128.

    et al. Variants in MTNR1B influence fasting glucose levels. Nature Genet. 41, 77–81 (2009).

  129. 129.

    , & Dissection of complex genetic disease: implications for orthopaedics. Clin. Orthop. Relat. Res. 419, 297–305 (2004).

  130. 130.

    et al. Autoimmune-associated lymphoid tyrosine phosphatase is a gain-of-function variant. Nature Genet. 37, 1317–1319 (2005).

  131. 131.

    et al. Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nature Genet. 39, 977–983 (2007).

  132. 132.

    et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nature Genet. 40, 310–315 (2008).

Download references


The authors are supported by a National Institutes of Health grant (NIH 1U54RR025204-01).

Author information


  1. Scripps Genomic Medicine, Scripps Translational Science Institute and The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037, USA.

    • Kelly A. Frazer
    • , Sarah S. Murray
    • , Nicholas J. Schork
    •  & Eric J. Topol


  1. Search for Kelly A. Frazer in:

  2. Search for Sarah S. Murray in:

  3. Search for Nicholas J. Schork in:

  4. Search for Eric J. Topol in:

Corresponding author

Correspondence to Kelly A. Frazer.


Structural variants

Broadly defined, these are all variants that are not single nucleotide variants. They include insertion–deletions, block substitutions, inversions of DNA sequences and copy number differences.

Genome-wide association (GWA) study

An investigation of the association between common genetic variation and disease. This type of analysis requires a dense set of markers (for example, SNPs) that capture a substantial proportion of common variation across the genome, and large numbers of study subjects.

Complex traits

Continuously distributed phenotypes that are classically believed to result from the independent action of many genes, environmental factors and gene-by-environment interactions.

Minor allele

The less common allele of a polymorphism.

Linkage disequilibrium

(LD). In population genetics, LD is the nonrandom association of alleles. For example, alleles of SNPs that reside near one another on a chromosome often occur in nonrandom combinations owing to infrequent recombination.

Population stratification

Subdivision of a population into different ethnic groups with potentially different marker allele frequencies and different disease prevalences.

Odds ratio

A measurement of association that is commonly used in case–control studies. It is defined as the odds of exposure to the susceptible genetic variant in cases compared with that in controls. If the odds ratio is significantly greater than one, then the genetic variant is associated with the disease.


In statistical genetics, this term refers to an interaction of multiple genetic variants (usually at different loci) such that the net phenotypic effect of carrying more than one variant is different than would be predicted by simply combining the effects of each individual variant.

About this article

Publication history



Further reading