Review Article | Published:

Pleiotropy in complex traits: challenges and strategies

Nature Reviews Genetics volume 14, pages 483495 (2013) | Download Citation


Genome-wide association studies have identified many variants that each affects multiple traits, particularly across autoimmune diseases, cancers and neuropsychiatric disorders, suggesting that pleiotropic effects on human complex traits may be widespread. However, systematic detection of such effects is challenging and requires new methodologies and frameworks for interpreting cross-phenotype results. In this Review, we discuss the evidence for pleiotropy in contemporary genetic mapping studies, new and established analytical approaches to identifying pleiotropic effects, sources of spurious cross-phenotype effects and study design considerations. We also outline the molecular and clinical implications of such findings and discuss future directions of research.

Key points

  • Genome-wide association studies have identified many novel loci for hundreds of traits. Interestingly, numerous genetic loci have been associated with multiple seemingly distinct traits. These cross-phenotype (CP) associations highlight the relevance of pleiotropy in human disease.

  • There is substantial evidence for CP associations in contemporary gene-mapping studies.

  • Different types of pleiotropy (biological, mediated and spurious pleiotropy) can underlie a CP association.

  • Various analytical approaches have been devised for detecting CP associations, especially methods that are based on summary statistics as opposed to individual-level data. Different methods have relative advantages and disadvantages and are distinguished by their underlying algorithms and by the types of phenotype data that they handle.

  • Study design considerations are crucial for minimizing the identification of spurious CP associations.

  • CP associations can highlight shared biological pathways and, when associated with different diseases, have clinical implications for diagnosis, counselling and treatment.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1.

    et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009). Characteristics of reported GWAS results listed in the US National Human Genome Research Institute (NHGRI) catalogue are discussed in this paper.

  2. 2.

    et al. Replication of putative candidate-gene associations with rheumatoid arthritis in >4,000 samples from North America and Sweden: association of susceptibility with PTPN22, CTLA4, and PADI4. Am. J. Hum. Genet. 77, 1044–1060 (2005).

  3. 3.

    et al. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nature Genet. 40, 955–962 (2008).

  4. 4.

    et al. Genetic association of the R620W polymorphism of protein tyrosine phosphatase PTPN22 with human SLE. Am. J. Hum. Genet. 75, 504–507 (2004).

  5. 5.

    et al. Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nature Genet. 39, 857–864 (2007).

  6. 6.

    & Architecture of inherited susceptibility to common cancer. Nature Rev. Cancer 10, 353–361 (2010).

  7. 7.

    Cross-Disorder Group of the Psychiatric Genomics Consortium. Identification of risk loci with shared effects on five major psychiatric disorders: a genome-wide analysis. Lancet 381, 1371–1379 (2013). This paper presents a genome-wide analysis of CP associations across five psychiatric disorders.

  8. 8.

    One hundred years of pleiotropy: a retrospective. Genetics 186, 767–773 (2010). This is a historical review of pleiotropy.

  9. 9.

    & The pleiotropic structure of the genotype–phenotype map: the evolvability of complex organisms. Nature Rev. Genet. 12, 204–213 (2011). This excellent Review discusses pleiotropy in model organisms and the implications for evolution.

  10. 10.

    , , , & Major depression and generalized anxiety disorder. Same genes, (partly) different environments? Arch. Gen. Psychiatry 49, 716–722 (1992).

  11. 11.

    et al. Analysis of families in the Multiple Autoimmune Disease Genetics Consortium (MADGC) collection: the PTPN22 620W allele associates with multiple autoimmune phenotypes. Am. J. Hum. Genet. 76, 561–571 (2005).

  12. 12.

    , , , & Epidemiology of autoimmune diseases in Denmark. J. Autoimmun. 29, 1–9 (2007).

  13. 13.

    et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).

  14. 14.

    et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011). Systematic evaluation of CP associations is carried out in this study across seven autoimmune diseases and application of CPMA method.

  15. 15.

    , , , & Autoimmune disease classification by inverse association with SNP alleles. PLoS Genet. 5, e1000792 (2009).

  16. 16.

    et al. Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease. Nature 491, 119–124 (2012). This is the largest study of Crohn's disease and ulcerative colitis and identifies more than 100 CP associations.

  17. 17.

    et al. Genome-wide association yields new sequence variants at seven loci that associate with measures of obesity. Nature Genet. 41, 18–24 (2009).

  18. 18.

    et al. A variant in FTO shows association with melanoma risk not due to BMI. Nature Genet. 45, 428–432 (2013).

  19. 19.

    et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nature Genet. 43, 333–338 (2011).

  20. 20.

    The Coronary Artery Disease (C4D) Genetics Consortium. A genome-wide association study in Europeans and South Asians identifies five new loci for coronary artery disease. Nature Genet. 43, 339–344 (2011).

  21. 21.

    et al. Genome-wide association study identifies five susceptibility loci for glioma. Nature Genet. 41, 899–904 (2009).

  22. 22.

    et al. Genome-wide association study of intracranial aneurysm identifies three new risk loci. Nature Genet. 42, 420–425 (2010).

  23. 23.

    et al. A genome-wide association scan of tag SNPs identifies a susceptibility variant for colorectal cancer at 8q24.21. Nature Genet. 39, 984–988 (2007).

  24. 24.

    et al. Multiple loci identified in a genome-wide association study of prostate cancer. Nature Genet. 40, 310–315 (2008).

  25. 25.

    , , , & Linking disease associations with regulatory information in the human genome. Genome Res. 22, 1748–1759 (2012).

  26. 26.

    & CNVs: harbingers of a rare variant revolution in psychiatric genetics. Cell 148, 1223–1241 (2012).

  27. 27.

    et al. Rare deletions at 16p13.11 predispose to a diverse spectrum of sporadic epilepsy syndromes. Am. J. Hum. Genet. 86, 707–718 (2010).

  28. 28.

    et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).

  29. 29.

    et al. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet 373, 234–239 (2009).

  30. 30.

    et al. Genetic variation in PTPN22 corresponds to altered function of T and B lymphocytes. J. Immunol. 179, 4704–4710 (2007).

  31. 31.

    et al. The PTPN22 allele encoding an R620W variant interferes with the removal of developing autoreactive B cells in humans. J. Clin. Invest. 121, 3635–3644 (2011).

  32. 32.

    et al. The autoimmune disease-associated PTPN22 variant promotes calpain-mediated Lyp/Pep degradation associated with lymphocyte and dendritic cell hyperresponsiveness. Nature Genet. 43, 902–907 (2011).

  33. 33.

    Lyp breakdown and autoimmunity. Nature Genet. 43, 821–822 (2011).

  34. 34.

    , & Detecting shared pathogenesis from the shared genetics of immune-related diseases. Nature Rev. Genet. 10, 43–55 (2009).

  35. 35.

    et al. The 8q24 cancer risk variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nature Genet. 41, 882–884 (2009).

  36. 36.

    , & An 8q24 gene desert variant associated with prostate cancer risk confers differential in vivo activity to a MYC enhancer. Genome Res. 20, 1191–1197 (2010).

  37. 37.

    et al. Plasma HDL cholesterol and risk of myocardial infarction: a Mendelian randomisation study. Lancet 380, 572–580 (2012). This paper presents an example of Mendelian randomization using results from GWASs.

  38. 38.

    et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633–637 (2008).

  39. 39.

    et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452, 638–642 (2008).

  40. 40.

    & Genomics: when the smoke clears. Nature 452, 537–538 (2008).

  41. 41.

    , , , & Estimation of pleiotropy between complex diseases using SNP-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).

  42. 42.

    & Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42, 121–130 (1986).

  43. 43.

    , , , & A multivariate family-based association test using generalized estimating equations: FBAT-GEE. Biostatistics 4, 195–206 (2003).

  44. 44.

    , , & Bivariate association analyses for the mixture of continuous and binary traits with the use of extended generalized estimating equations. Genet. Epidemiol. 33, 217–227 (2009).

  45. 45.

    et al. Modifiers and subtype-specific analyses in whole-genome association studies: a likelihood framework. Hum. Hered. 72, 10–20 (2011).

  46. 46.

    , , , & Bayesian methods for multivariate modeling of pleiotropic SNP associations and genetic risk prediction. Front. Genet. 3, 176 (2012).

  47. 47.

    et al. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS. PLoS ONE 7, e34861 (2012).

  48. 48.

    , & An association test for multiple traits based on the generalized Kendall's tau. J. Am. Stat. Assoc. 105, 473–481 (2010).

  49. 49.

    & A principal-components approach based on heritability for combining phenotype information. Hum. Hered. 49, 106–111 (1999).

  50. 50.

    et al. A family-based association test for repeatedly measured quantitative traits adjusting for unknown environmental and/or polygenic effects. Stat. Appl. Genet. Mol. Biol. 3, Article17 (2004).

  51. 51.

    , , & Pleiotropy and principal components of heritability combine to increase power for association analysis. Genet. Epidemiol. 32, 9–19 (2008).

  52. 52.

    & A multivariate test of association. Bioinformatics 25, 132–133 (2009).

  53. 53.

    Moving toward system genetics through multiple trait analysis in genome-wide association studies. Front. Genet. 3, 1 (2012). This is a review of multivariate approaches for detecting CP associations.

  54. 54.

    , & Validating, augmenting and refining genome-wide association signals. Nature Rev. Genet. 10, 318–329 (2009).

  55. 55.

    et al. Bayesian inference analyses of the polygenic architecture of rheumatoid arthritis. Nature Genet. 44, 483–489 (2012).

  56. 56.

    Statistical Methods for Research Workers (Oliver & Boyd, 1925).

  57. 57.

    & Methods for meta-analysis in genetic association studies: a review of their potential and pitfalls. Hum. Genet. 123, 1–14 (2008).

  58. 58.

    et al. Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17, R122–R128 (2008).

  59. 59.

    et al. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am. J. Hum. Genet. 90, 821–835 (2012).

  60. 60.

    Procedures for comparing samples with multiple endpoints. Biometrics 40, 1079–1087 (1984).

  61. 61.

    , & Combining dependent tests for linkage or association across multiple phenotypic traits. Biostatistics 4, 223–229 (2003).

  62. 62.

    , , & Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet. Epidemiol. 34, 444–454 (2010).

  63. 63.

    , & TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLoS Genet. 9, e1003235 (2013).

  64. 64.

    , & O'Donnell, C. J. PRIMe: a method for characterization and evaluation of pleiotropic regions from multiple genome-wide association studies. Bioinformatics 27, 1201–1206 (2011).

  65. 65.

    et al. Candidate causal regulatory effects by integration of expression QTLs with complex trait genetic associations. PLoS Genet. 6, e1000895 (2010).

  66. 66.

    & Meta-analysis of genome-wide association studies with overlapping subjects. Am. J. Hum. Genet. 85, 862–872 (2009).

  67. 67.

    & Promise and pitfalls of the immunochip. Arthritis Res. Ther. 13, 101 (2011).

  68. 68.

    et al. The metabochip, a custom genotyping array for genetic studies of metabolic, cardiovascular, and anthropometric traits. PLoS Genet. 8, e1002793 (2012).

  69. 69.

    et al. On the adjustment for covariates in genetic association analysis: a novel, simple principle to infer direct causal effects. Genet. Epidemiol. 33, 394–405 (2009).

  70. 70.

    & CGene: an R package for implementation of causal genetic analyses. Eur. J. Hum. Genet. 19, 1292–1294 (2011).

  71. 71.

    & Odds ratios for mediation analysis for a dichotomous outcome. Am. J. Epidemiol. 172, 1339–1348 (2010).

  72. 72.

    et al. Genetic variants on 15q25.1, smoking, and lung cancer: an assessment of mediation and interaction. Am. J. Epidemiol. 175, 1013–1020 (2012).

  73. 73.

    , , , & Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27, 1133–1163 (2008).

  74. 74.

    , & Credible Mendelian randomization studies: approaches for evaluating the instrumental variable assumptions. Am. J. Epidemiol. 175, 332–339 (2012).

  75. 75.

    et al. The heritability of bipolar affective disorder and the genetic relationship to unipolar depression. Arch. Gen. Psychiatry 60, 497–502 (2003).

  76. 76.

    , , , & Shared heritability of attention-deficit/hyperactivity disorder and autism spectrum disorder. Eur. Child Adolesc. Psychiatry 19, 281–295 (2010).

  77. 77.

    et al. Evidence of association of APOE with age-related macular degeneration: a pooled analysis of 15 studies. Hum. Mutat. 32, 1407–1416 (2011).

  78. 78.

    et al. Collaborative genome-wide association analysis supports a role for ANK3 and CACNA1C in bipolar disorder. Nature Genet. 40, 1056–1058 (2008).

  79. 79.

    et al. Comparative genetic analysis of inflammatory bowel disease and type 1 diabetes implicates multiple loci with opposite effects. Hum. Mol. Genet. 19, 2059–2067 (2010).

  80. 80.

    et al. Shared and distinct genetic variants in type 1 diabetes and celiac disease. N. Engl. J. Med. 359, 2767–2777 (2008).

  81. 81.

    et al. Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci. PLoS Genet. 7, e1002004 (2011).

  82. 82.

    et al. TNF receptor 1 genetic risk mirrors outcome of anti-TNF therapy in multiple sclerosis. Nature 488, 508–511 (2012).

  83. 83.

    , & Network medicine: a network-based approach to human disease. Nature Rev. Genet. 12, 56–68 (2011).

  84. 84.

    et al. The human disease network. Proc. Natl Acad. Sci. USA 104, 8685–8690 (2007). A first step is taken in this study towards the construction of the genotype–phenotype map in humans using known disease genes reported in OMIM (Online Mendelian Inheritance in Man).

  85. 85.

    et al. The implications of human metabolic network topology for disease comorbidity. Proc. Natl Acad. Sci. USA 105, 9880–9885 (2008).

  86. 86.

    , , , & The association between mutations in the lysosomal protein glucocerebrosidase and parkinsonism. Mov. Disord. 24, 1571–1578 (2009).

  87. 87.

    et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010).

  88. 88.

    et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89, 529–542 (2011).

  89. 89.

    et al. The use of phenome-wide association studies (PheWAS) for exploration of novel genotype-phenotype relationships and pleiotropy discovery. Genet. Epidemiol. 35, 410–422 (2011).

  90. 90.

    et al. Phenome-wide association study (PheWAS) for detection of pleiotropy within the population architecture using genomics and epidemiology (PAGE) network. PLoS Genet. 9, e1003087 (2013).

  91. 91.

    et al. High density GWAS for LDL cholesterol in African Americans using electronic medical records reveals a strong protective variant in APOE. Clin. Transl. Sci. 5, 394–399 (2012).

  92. 92.

    , & Implications of comorbidity and ascertainment bias for identifying disease genes. Am. J. Med. Genet. 96, 817–822 (2000).

  93. 93.

    Limitations of the application of fourfold table analysis to hospital data. Biometrics 2, 47–53 (1946).

  94. 94.

    , & Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes. Eur. J. Hum. Genet. 20, 668–674 (2012).

  95. 95.

    et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008). This Review presents an overview of key considerations and challenges in GWASs.

  96. 96.

    et al. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol. 34, 591–602 (2010).

  97. 97.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).

  98. 98.

    , , & New approaches to population stratification in genome-wide association studies. Nature Rev. Genet. 11, 459–463 (2010).

  99. 99.

    et al. Genome-wide association studies in diverse populations. Nature Rev. Genet. 11, 356–366 (2010).

  100. 100.

    & Genotype imputation for genome-wide association studies. Nature Rev. Genet. 11, 499–511 (2010).

  101. 101.

    Advances in translational bioinformatics: computational approaches for the hunting of disease genes. Brief. Bioinform. 11, 96–110 (2010).

  102. 102.

    et al. A method and server for predicting damaging missense mutations. Nature Methods 7, 248–249 (2010).

  103. 103.

    , & Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protoc. 4, 1073–1081 (2009).

  104. 104.

    et al. Principles for the post-GWAS functional characterization of cancer risk loci. Nature Genet. 43, 513–518 (2011).

  105. 105.

    et al. Trans-eQTLs reveal that independent genetic variants associated with a complex phenotype converge on intermediate genes, with a major role for the HLA. PLoS Genet. 7, e1002197 (2011).

  106. 106.

    & The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends Genet. 27, 72–79 (2011).

  107. 107.

    , & Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 24, 408–415 (2008).

  108. 108.

    Biorepositories: building better biobanks. Nature 486, 141–146 (2012).

  109. 109.

    , & Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am. J. Hum. Genet. 86, 6–22 (2010).

  110. 110.

    et al. Pathway analysis of GWAS provides new insights into genetic susceptibility to 3 inflammatory diseases. PLoS Genet. 4, e8068 (2009).

  111. 111.

    & Establishment in culture of pluripotential cells from mouse embryos. Nature 292, 154–156 (1981).

  112. 112.

    , , , & Insertion of DNA sequences into the human chromosomal β-globin locus by homologous recombination. Nature 317, 230–234 (1985).

  113. 113.

    , & High frequency targeting of genes to specific sites in the mammalian genome. Cell 44, 419–428 (1986).

  114. 114.

    et al. In vivo genome editing restores haemostasis in a mouse model of haemophilia. Nature 475, 217–221 (2011).

  115. 115.

    & Genome-scale engineering for systems and synthetic biology. Mol. Syst. Biol. 9, 641 (2013).

  116. 116.

    & Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nature Rev. Genet. 12, 628–640 (2011).

  117. 117.

    et al. Integrating autoimmune risk loci with gene-expression data identifies specific pathogenic immune cell subsets. Am. J. Hum. Genet. 89, 496–506 (2011).

  118. 118.

    et al. Galectin-3 regulates myofibroblast activation and hepatic fibrosis. Proc. Natl Acad. Sci. USA 103, 5060–5065 (2006).

  119. 119.

    et al. The roles of galectin-3 in autoimmunity and tumor progression. Immunol. Res. 52, 100–110 (2012).

  120. 120.

    , , & Down-regulation of galectin-3 suppresses tumorigenicity of human breast carcinoma cells. Clin. Cancer Res. 7, 661–668 (2001).

  121. 121.

    , , , & Alterations in galectin-3 expression and distribution correlate with breast cancer progression: functional analysis of galectin-3 in breast epithelial-endothelial interactions. Am. J. Pathol. 165, 1931–1941 (2004).

  122. 122.

    , , & Mechano-transduction mediated secretion and uptake of galectin-3 in breast carcinoma cells: implications in the extracellular functions of the lectin. Exp. Cell Res. 313, 652–664 (2007).

  123. 123.

    et al. Cleavage of galectin-3 by matrix metalloproteases induces angiogenesis in breast cancer. Int. J. Cancer 127, 2530–2541 (2010).

  124. 124.

    et al. Using multiple genetic variants as instrumental variables for modifiable risk factors. Stat. Methods Med. Res. 21, 223–242 (2012).

  125. 125.

    et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science 314, 1461–1463 (2006).

  126. 126.

    et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nature Genet. 43, 761–767 (2011).

  127. 127.

    et al. Ulcerative colitis-risk loci on chromosomes 1p36 and 12q15 found by genome-wide association study. Nature Genet. 41, 216–220 (2009).

  128. 128.

    et al. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nature Genet. 42, 985–990 (2010).

  129. 129.

    et al. Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci. Nature Genet. 42, 1118–1125 (2010).

  130. 130.

    et al. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466, 707–713 (2010).

  131. 131.

    et al. Common variation at 3p22.1 and 7p15.3 influences multiple myeloma risk. Nature Genet. 44, 58–61 (2012).

  132. 132.

    et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011).

  133. 133.

    Williams-Beuren syndrome. N. Engl. J. Med. 362, 239–252 (2010).

Download references


This work was supported in part by the US National Institute of Mental Health (NIMH) grants R01-MH079799 and K24MH094614 (both to J.W.S.).

Author information


  1. Center for Human Genetics Research, Massachusetts General Hospital, 185 Cambridge Street, Boston, Massachusetts 02114, USA.

    • Nadia Solovieff
    • , Phil H. Lee
    • , Shaun M. Purcell
    •  & Jordan W. Smoller
  2. Department of Psychiatry, Harvard Medical School, 2 West, Room 305, 401 Park Drive, Boston, Massachusetts 02215, USA.

    • Nadia Solovieff
    • , Phil H. Lee
    • , Shaun M. Purcell
    •  & Jordan W. Smoller
  3. Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.

    • Nadia Solovieff
    • , Phil H. Lee
    • , Shaun M. Purcell
    •  & Jordan W. Smoller
  4. Departments of Neurology and Genetics, Yale University School of Medicine, Yale University School of Medicine, 333 Cedar Street, New Haven, Connecticut 06520, USA.

    • Chris Cotsapas
  5. Medical and Population Genetics, Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, Massachusetts 02142, USA.

    • Chris Cotsapas
  6. Division of Psychiatric Genomics, Mount Sinai School of Medicine, 1 Gustave L. Levy Place, New York, New York 10029–6574, USA.

    • Shaun M. Purcell


  1. Search for Nadia Solovieff in:

  2. Search for Chris Cotsapas in:

  3. Search for Phil H. Lee in:

  4. Search for Shaun M. Purcell in:

  5. Search for Jordan W. Smoller in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Jordan W. Smoller.


Genome-wide association studies

(GWASs). Studies in which hundreds of thousands (or millions) of genetic markers are tested for association with a phenotypic trait; they are an unbiased approach to survey the entire genome for disease-associated regions using common variation.


A term describing the statistical significance threshold that accounts for multiple testing in GWASs.

Complex traits

Traits controlled by a combination of many genes and environmental factors.


A gene or genetic variant that affects more than one phenotypic trait.


The proportion of phenotypic variance attributed to genetic differences among individuals in a population.


Different genetic variants in high linkage disequilibrium located in the same gene that affect different phenotypes.

Single-nucleotide polymorphisms

Single-nucleotides in the genome that vary across individuals in the population.

Linkage disequilibrium

(LD). The correlation between genetic markers owing to limited recombination.

Copy number variants

Regions of the genome in which the copy number is polymorphic (for example, deletions and duplications) across individuals.


Controlled by many genes.

Population stratification

A source of bias in genome-wide association studies that occurs when a phenotype and the allele frequency of a single-nucleotide polymorphism vary owing to ancestral differences.

Batch effect

Systematic biases in the data that arise from differences in sample handling.

Genotype imputation

Inference of missing genotypes or untyped single-nucleotide polymorphisms using statistical techniques.

Ascertainment bias

A consequence of collecting a nonrandom subsample with a systematic bias so that results based on the subsample are not representative of the entire sample.

Tag SNPs

Single-nucleotide polymorphisms (SNPs) chosen to represent a region of the genome owing to strong linkage disequilibrium.

Multivariate analyses

The simultaneous inclusion of two or more phenotypes in one analysis when testing the association with a genetic variant.

Univariate analyses

Tests of association between one phenotype and a genetic variant.

Polygenic scoring

A score that aggregates the number of risk alleles a subject carries weighted by the effect size of the allele for a particular trait. The risk allele and effect size for each single-nucleotide polymorphism is generally taken from a genome-wide association study of an independent study.

Linear mixed-effect model

A linear model that contains both fixed and random effects. This type of model can be used to estimate genetic correlation between traits using a genome-wide set of single-nucleotide polymorphisms.

Cohort studies

Observational studies in which defined groups of people (the cohorts) are followed over time and outcomes are compared in subsets of the cohort who were exposed to different levels of factors of interest. These studies can either be prospectively or retrospectively carried out from historical records.

Cross-sectional studies

Studies in which data are collected on subjects at one specific point in time and subjects are not selected for a particular trait or exposure.

Case–control study

Compares cases (that is, a selected group of individuals: for example, those diagnosed with a disorder) with controls (that is, a comparison group of individuals: for example, those who are not diagnosed with the disorder). Genome-wide association case–control studies test whether genetic marker allele frequencies differ between cases and controls.

Generalized estimating equations

A statistical technique used to estimate regression parameters that does not require the joint distribution of the variables to be fully specified.

Log-linear model

A statistical model that captures the dependence among a set of categorical variables.

Bayesian network

A network that captures relationships between variables or nodes of interest (for example, phenotypes and SNPs). Bayesian networks can incorporate prior information in establishing relationships between variables.

Ordinal regression

A regression model in which the outcome variable is ordinal.

Non-parametric approach

A statistical analysis method that does not rely on specific distributional assumptions (for example, normality) for the variables being analysed.

Principal components analysis

A statistical method used to simplify data sets by transforming a series of correlated variables into a smaller number of uncorrelated factors. It is also commonly used to infer continuous axes of variation in genetic data, often representing genetic ancestry.

Summary statistics

A statistic that summarizes a set of observations. In the context of genome-wide association studies, meta-analyses can be carried out solely by using summary statistics and typically include estimates of the effect size (for example, odds ratio) and standard error.

Effect heterogeneity

Different effect sizes across phenotypes.

Expression quantitative trait loci

Loci at which genetic allelic variation is associated with variation in gene expression.

Fine mapping

Extensively genotyping or sequencing a region of the genome that was identified in genome-wide association studies to identify the causal variant.

Confounding factor

A variable (for example, batch effects or population structure) that is associated with both the genotype and the phenotype of interest and can give rise to a spurious association.

Genetic architecture

A genetic model (that is, the number of single-nucleotide polymorphisms, effect sizes, allele frequency, and so on) underlying a phenotypic trait.

About this article

Publication history



Further reading