Abstract
The genetic correlation describes the genetic relationship between two traits and can contribute to a better understanding of the shared biological pathways and/or the causality relationships between them. The rarity of large family cohorts with recorded instances of two traits, particularly disease traits, has made it difficult to estimate genetic correlations using traditional epidemiological approaches. However, advances in genomic methodologies, such as genome-wide association studies, and widespread sharing of data now allow genetic correlations to be estimated for virtually any trait pair. Here, we review the definition, estimation, interpretation and uses of genetic correlations, with a focus on applications to human disease.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Polderman, T. J. C. et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat. Genet. 47, 702–709 (2015).
Craddock, N. & Owen, M. J. The beginning of the end for the Kraepelinian dichotomy. Br. J. Psychiatry 186, 364–366 (2005).
Maret-Ouda, J., Tao, W., Wahlin, K. & Lagergren, J. Nordic registry-based cohort studies: possibilities and pitfalls when combining Nordic registry data. Scand. J. Public Health 45 (Suppl. 17), 14–19 (2017).
Lichtenstein, P. et al. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet 373, 234–239 (2009). This work reports a population-scale data set for estimation of genetic correlation between diseases based on family data.
Stearns, F. W. One hundred years of pleiotropy: a retrospective. Genetics 186, 767–773 (2010).
Paaby, A. B. & Rockman, M. V. The many faces of pleiotropy. Trends Genet. 29, 66–73 (2013).
Grüneberg, H. An analysis of the ‘pleiotropic’ effects of a new lethal mutation in the rat (Mus norvegicus). Proc. R. Soc. Lond. B 125, 123–144 (1938).
Wagner, G. P. & Zhang, J. The pleiotropic structure of the genotype–phenotype map: the evolvability of complex organisms. Nat. Rev. Genet. 12, 204 (2011).
Zhu, Z. et al. Causal associations between risk factors and common diseases inferred from GWAS summary data. Nat. Commun. 9, 224 (2018).
Verbanck, M., Chen, C.-Y., Neale, B. & Do, R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50, 693–698 (2018).
Shi, H., Mancuso, N., Spendlove, S. & Pasaniuc, B. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101, 737–751 (2017).
Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 1193–1198 (2012).
Cheverud, J. M. A comparison of genetic and phenotypic correlations. Evolution 42, 958–968 (1988). This study describes phenotypic correlations as estimates of genetic correlations based on observation data.
Rzhetsky, A., Wajngurt, D., Park, N. & Zheng, T. Probing genetic overlap among complex human phenotypes. Proc. Natl Acad. Sci. USA 104, 11694–11699 (2007).
Cross-Disorder Group of the Psychiatric Genomics Consortium et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45, 984–994 (2013). This study is among the first to estimate genetic correlation between diseases using independently collected GWAS samples.
Tenesa, A. & Haley, C. S. The heritability of human disease: estimation, uses and abuses. Nat. Rev. Genet. 14, 139–149 (2013).
Falconer, D. S. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann. Hum. Genet. 29, 51–76 (1965).
Reich, T., James, J. W. & Morris, C. A. The use of multiple thresholds in determining the mode of transmission of semi-continuous traits. Ann. Hum. Genet. 36, 163–184 (1972).
Wray, N. R. & Gottesman, I. I. Using summary data from the Danish national registers to estimate heritabilities for schizophrenia, bipolar disorder, and major depressive disorder. Front. Genet. 3, 118 (2012).
Pearson, K. I. Mathematical contributions to the theory of evolution. — VII. On the correlation of characters not quantitatively measurable. Philos. Trans. A Math. Phys. Eng. Sci. 195, 1–405 (1900).
Sham, P. Statistics in Human Genetics (Wiley, 1998).
Olsson, U. Maximum likelihood estimation of the polychoric correlation coefficient. Psychometrika 44, 443–460 (1979).
Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012). This study introduces the bivariate GREML method to estimate genetic correlation from genome-wide SNP data.
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
Lee, S. H., Wray, N. R., Goddard, M. E. & Visscher, P. M. Estimating missing heritability for disease from genome-wide association studies. Am. J. Hum. Genet. 88, 294–305 (2011).
Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics 4th edn (Pearson, 1996).
Zaitlen, N. et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLOS Genet. 9, e1003520 (2013).
Bulik-Sullivan, B. K. et al. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47, 291–295 (2015).
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015). This study introduces the LDSC method to estimate genetic correlation from GWAS summary data.
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017). This work introduces LD Hub, a server that hosts GWAS summary statistics and LDSC analyses to estimate genetic correlations.
Brainstorm Consortium et al. Analysis of shared heritability in common disorders of the brain. Science 360, eaap8757 (2018).
Yang, J. et al. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43, 519–525 (2011).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Visscher, P. M. et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLOS Genet. 10, e1004269 (2014).
Lu, Q. et al. A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet. 101, 939–964 (2017).
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Brown, B. C., Asian Genetic Epidemiology Network Type 2 Diabetes Consortium, Ye, C. J., Price, A. L. & Zaitlen, N. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99, 76–88 (2016).
de Candia, T. R. et al. Additive genetic variation in schizophrenia risk is shared by populations of African and European descent. Am. J. Hum. Genet. 93, 463–470 (2013).
Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat. Genet. 47, 979–986 (2015).
Yang, L. et al. Polygenic transmission and complex neurodevelopmental network for attention deficit hyperactivity disorder: genome-wide association study of both common and rare variants. Am. J. Med. Genet. B Neuropsychiatr. Genet. 162B, 419–430 (2013).
Speed, D., Hemani, G., Johnson, M. R. & Balding, D. J. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91, 1011–1021 (2012).
Yang, J. et al. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47, 1114–1120 (2015).
Speed, D. et al. Reevaluation of SNP heritability in complex human traits. Nat. Genet. 49, 986–992 (2017).
Speed, D. & Balding, D. J. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51, 277–284 (2019).
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK models and functional enrichment estimates. Preprint at bioRxiv https://doi.org/10.1101/256412 (2018).
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Evans, L. M. et al. Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits. Nat. Genet. 50, 737–745 (2018).
Ni, G., Moser, G., Schizophrenia Working Group of the Psychiatric Genomics Consortium, Wray, N. R. & Lee, S. H. Estimation of genetic correlation via linkage disequilibrium score regression and genomic restricted maximum likelihood. Am. J. Hum. Genet. 102, 1185–1194 (2018).
Weissbrod, O., Flint, J. & Rosset, S. Estimating SNP-based heritability and genetic correlation in case–control studies directly and with summary statistics. Am. J. Hum. Genet. 103, 89–99 (2018).
Golan, D., Lander, E. S. & Rosset, S. Measuring missing heritability: inferring the contribution of common variants. Proc. Natl Acad. Sci. USA 111, E5272–E5281 (2014).
Yang, J., Zeng, J., Goddard, M. E., Wray, N. R. & Visscher, P. M. Concepts, estimation and interpretation of SNP-based heritability. Nat. Genet. 49, 1304–1310 (2017).
Holmes, J. B., Speed, D. & Balding, D. J. Summary statistic analyses do not correct confounding bias. Preprint at bioRxiv https://doi.org/10.1101/532069 (2019).
Yengo, L., Yang, J. & Visscher, P. M. Expectation of the intercept from bivariate LD score regression in the presence of population stratification. Preprint at bioRxiv https://doi.org/10.1101/310565 (2018).
Gianola, D. Assortative mating and the genetic correlation. Theor. Appl. Genet. 62, 225–231 (1982).
Peyrot, W. J., Robinson, M. R., Penninx, B. W. J. H. & Wray, N. R. Exploring boundaries for the genetic consequences of assortative mating for psychiatric traits. JAMA Psychiatry 73, 1189–1195 (2016).
Wray, N. R., Lee, S. H. & Kendler, K. S. Impact of diagnostic misclassification on estimation of genetic correlations using genome-wide genotypes. Eur. J. Hum. Genet. 20, 668–674 (2012).
Bromet, E. J. et al. Diagnostic shifts during the decade following first admission for psychosis. Am. J. Psychiatry 168, 1186–1194 (2011).
Han, B. et al. A method to decipher pleiotropy by detecting underlying heterogeneity driven by hidden subgroups applied to autoimmune and neuropsychiatric diseases. Nat. Genet. 48, 803–810 (2016). This work describes a method that tries to distinguish between genetic correlation driven by sample heterogeneity and that driven by trait pleiotropy.
Munafò, M. R., Tilling, K., Taylor, A. E., Evans, D. M. & Davey Smith, G. Collider scope: when selection bias can substantially influence observed associations. Int. J. Epidemiol. 47, 226–235 (2018).
Allen, N. et al. UK Biobank: current status and what it means for epidemiology. Health Policy Technol. 1, 123–126 (2012).
Vuckovic, D., Gasparini, P., Soranzo, N. & Iotchkova, V. MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies. Bioinformatics 31, 2754–2756 (2015).
Bhattacharjee, S. et al. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits. Am. J. Hum. Genet. 90, 821–835 (2012).
Qi, G. & Chatterjee, N. Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits. PLOS Genet. 14, e1007549 (2018).
Turley, P. et al. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50, 229–237 (2018).
Ray, D. & Boehnke, M. Methods for meta-analysis of multiple traits using GWAS summary statistics. Genet. Epidemiol. 42, 134–145 (2018).
O’Brien, P. C. Procedures for comparing samples with multiple endpoints. Biometrics 40, 1079–1087 (1984).
Xu, X., Tian, L. & Wei, L. J. Combining dependent tests for linkage or association across multiple phenotypic traits. Biostatistics 4, 223–229 (2003).
Yang, Q., Wu, H., Guo, C.-Y. & Fox, C. S. Analyze multivariate phenotypes in genetic association studies by combining univariate association tests. Genet. Epidemiol. 34, 444–454 (2010).
Bolormaa, S. et al. A multi-trait, meta-analysis for detecting pleiotropic polymorphisms for stature, fatness and reproduction in beef cattle. PLOS Genet. 10, e1004198 (2014).
Zhu, X. et al. Meta-analysis of correlated traits via summary statistics from GWASs with an application in hypertension. Am. J. Hum. Genet. 96, 21–36 (2015).
He, L. et al. Pleiotropic meta-analyses of longitudinal studies discover novel genetic variants associated with age-related diseases. Front. Genet. 7, 179 (2016).
Grotzinger, A. D. et al. Genomic structural equation modelling provides insights into the multivariate genetic architecture of complex traits. Nat. Hum. Behav. 3, 513–525 (2019).
van der Sluis, S., Posthuma, D. & Dolan, C. V. TATES: efficient multivariate genotype-phenotype analysis for genome-wide association studies. PLOS Genet. 9, e1003235 (2013).
Cichonska, A. et al. metaCCA: summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis. Bioinformatics 32, 1981–1989 (2016).
Andreassen, O. A. et al. Improved detection of common variants associated with schizophrenia and bipolar disorder using pleiotropy-informed conditional false discovery rate. PLOS Genet. 9, e1003455 (2013).
Liley, J. & Wallace, C. A pleiotropy-informed Bayesian false discovery rate adapted to a shared control design finds new disease associations from GWAS summary statistics. PLOS Genet. 11, e1004926 (2015).
Majumdar, A., Haldar, T., Bhattacharya, S. & Witte, J. S. An efficient Bayesian meta-analysis approach for studying cross-phenotype genetic associations. PLOS Genet. 14, e1007139 (2018).
Chung, D., Yang, C., Li, C., Gelernter, J. & Zhao, H. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLOS Genet. 10, e1004787 (2014).
Wei, W. et al. GPA-MDS: a visualization approach to investigate genetic architecture among phenotypes using GWAS results. Int. J. Genomics 2016, 6589843 (2016).
Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).
Shriner, D. Moving toward system genetics through multiple trait analysis in genome-wide association studies. Front. Genet. 3, 1 (2012).
Wray, N. R. et al. Pitfalls of predicting complex traits from SNPs. Nat. Rev. Genet. 14, 507–515 (2013).
Khera, A. V. et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 50, 1219–1224 (2018).
Torkamani, A., Wineinger, N. E. & Topol, E. J. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19, 581–590 (2018).
Lee, S. H., Clark, S. & van der Werf, J. H. J. Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship. PLOS ONE 12, e0189775 (2017).
Maier, R. et al. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am. J. Hum. Genet. 96, 283–294 (2015).
Guo, G. et al. Comparison of single-trait and multiple-trait genomic prediction models. BMC Genet. 15, 30 (2014).
Dudbridge, F. Power and predictive accuracy of polygenic risk scores. PLOS Genet. 9, e1003348 (2013).
Li, C., Yang, C., Gelernter, J. & Zhao, H. Improving genetic risk prediction by leveraging pleiotropy. Hum. Genet. 133, 639–650 (2014).
Maier, R. M. et al. Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat. Commun. 9, 989 (2018).
Hu, Y. et al. Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLOS Genet. 13, e1006836 (2017).
Pingault, J.-B. et al. Using genetic data to strengthen causal inference in observational research. Nat. Rev. Genet. 19, 566–580 (2018).
Smith, G. D., Davey Smith, G. & Hemani, G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23, R89–R98 (2014).
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Hemani, G. et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Hemani, G., Tilling, K. & Davey Smith, G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLOS Genet. 13, e1007081 (2017).
Burgess, S. & Thompson, S. G. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am. J. Epidemiol. 181, 251–260 (2015). This study introduces MR, a method to determine whether genetic correlation results from a causal relationship.
Do, R. et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nat. Genet. 45, 1345–1352 (2013).
Baigent, C. et al. Efficacy and safety of cholesterol-lowering treatment: prospective meta-analysis of data from 90,056 participants in 14 randomised trials of statins. Lancet 366, 1267–1278 (2005).
Nissen, S. E. et al. Effect of torcetrapib on the progression of coronary atherosclerosis. N. Engl. J. Med. 356, 1304–1316 (2007).
Barter, P. J. et al. Effects of torcetrapib in patients at high risk for coronary events. N. Engl. J. Med. 357, 2109–2122 (2007).
O’Connor, L. J. & Price, A. L. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat. Genet. 50, 1728–1734 (2018).
Deng, Y. & Pan, W. Conditional analysis of multiple quantitative traits based on marginal GWAS summary statistics. Genet. Epidemiol. 41, 427–436 (2017).
Nieuwboer, H. A., Pool, R., Dolan, C. V., Boomsma, D. I. & Nivard, M. G. GWIS: genome-wide inferred statistics for functions of multiple phenotypes. Am. J. Hum. Genet. 99, 917–927 (2016).
Li, Y. & Kellis, M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic Acids Res. 44, e144 (2016).
Kichaev, G. et al. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics 33, 248–255 (2017).
Pickrell, J. K. et al. Detection and interpretation of shared genetic influences on 42 human traits. Nat. Genet. 48, 709–717 (2016).
Barton, N. H. Pleiotropic models of quantitative variation. Genetics 124, 773–782 (1990).
Walsh, B. & Blows, M. W. Abundant genetic variation + strong selection = multivariate genetic constraints: a geometric view of adaptation. Annu. Rev. Ecol. Evol. Syst. 40, 41–59 (2009). This work puts forward arguments for multivariate genetic constraints and strong limits on the number of independent traits.
Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Inouye, M. et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 72, 1883–1893 (2018).
Ferreira, M. A. et al. Shared genetic origin of asthma, hay fever and eczema elucidates allergic disease biology. Nat. Genet. 49, 1752–1757 (2017).
Lee, S. H. & van der Werf, J. H. J. MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information. Bioinformatics 32, 1420–1422 (2016).
Loh, P.-R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Loh, P.-R. et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nat. Genet. 47, 1385–1392 (2015).
Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLOS Genet. 7, e1002254 (2011).
Dai, M. et al. Joint analysis of individual-level and summary-level GWAS data by leveraging pleiotropy. Bioinformatics 35, 1729–1736 (2018).
Liu, J., Wan, X., Ma, S. & Yang, C. EPS: an empirical Bayes approach to integrating pleiotropy and tissue-specific information for prioritizing risk genes. Bioinformatics 32, 1856–1864 (2016).
Acknowledgements
W.v.R. was funded by the ALS Foundation Netherlands. W.J.P. was funded by an NWO Veni grant (91619152). S.H.L. is an ARC Future Fellow (FT160100229). N.R.W. acknowledges funding from the Australian National Health and Medical Research Council (1078901, 1087889 and 1113400). W.v.R. and N.R.W. acknowledge funding from the EU Joint Programme – Neurodegenerative Disease Research (JPND) project (Australia, NHMRC 1151854; The Netherlands, ZonMW project number 733051071). The authors thank K. Tilling, G. Davey Smith and the members of the University of Queensland Program in Complex Trait Genomics for their insightful discussions.
Competing interests
The authors declare no competing interests.
Reviewer information
Nature Reviews Genetics thanks D. Balding, B. Pasaniuc and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Author information
Authors and Affiliations
Contributions
All authors researched data for the article, made substantial contributions to discussions of the content and reviewed and/or edited the manuscript before submission. W.v.R. and N.R.W. wrote the article.
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
BUHMBOX: http://software.broadinstitute.org/mpg/buhmbox/
fastPAINTOR: https://github.com/gkichaev/PAINTOR_V3.0
GCTA: http://cnsgenomics.com/software/gcta/
GNOVA: https://github.com/xtonyjiang/GNOVA
GWIS: https://sites.google.com/site/mgnivard/gwis
JPND: www.jpnd.eu
LCV: https://github.com/lukejoconnor/LCV
LDAK: http://dougspeed.com/ldak
LD Hub: http://ldsc.broadinstitute.org/ldhub/
LDSC: https://github.com/bulik/ldsc
MR Steiger: https://github.com/explodecomputer/causal-directions
MTAG: https://github.com/omeed-maghzian/mtag
PCGC: https://data.broadinstitute.org/alkesgroup/PCGC/
ρHESS: https://github.com/huwenboshi/hess
PleioPred: https://github.com/yiminghu/PleioPred
Popcorn: https://github.com/brielin/Popcorn
RiVIERA: https://github.com/yueli-compbio/RiVIERA-beta
SMTpred: https://github.com/uqrmaie1/smtpred
SumHer: http://dougspeed.com/sumher/
Supplementary information
Glossary
- Parameter
-
A numerical value that summarizes a characteristic of a population, such as the mean height of men, the lifetime risk of schizophrenia or the heritability of a specific trait.
- Traits
-
Measurements or phenotypes that are usually studied as the outcome of statistical analyses. They can be quantitative (for example, height) or dichotomous (for example, schizophrenia).
- Estimates
-
Approximations of a parameter based on a sample of observed data drawn from a population.
- Ascertainment biases
-
Types of bias that occur when the studied trait or disease affects how data were ascertained. For example, patients with a family history of diabetes may have more frequent examinations for cardiovascular diseases.
- Genome-wide association studies
-
Studies in which up to millions of mostly common single-nucleotide polymorphisms from across the genome are each tested for association with a trait.
- GWAS summary statistics
-
The output of statistical tests of association of a trait with each single-nucleotide polymorphism generated by a genome-wide association study (GWAS), typically including the effect allele, signed effect estimate, standard error, test statistic (for example, a z-score) and/or p-value.
- Power
-
The probability that a study correctly rejects the null hypothesis of no association or correlation, also described as 1– type II error.
- Bias
-
Phenomenon where statistical analyses produce estimates in observed data that systematically overestimate or underestimate the population parameter. Bias can arise from the ascertainment of the observed data or the statistical procedures used to generate the estimates.
- Linkage disequilibrium
-
(LD). The non-random segregation of alleles at two distinct loci. LD induces a correlation between two single-nucleotide polymorphism (SNP) genotypes in the population and is caused by the fact that alleles of neighbouring SNPs are transmitted together until broken down by recombination events.
- Genetic value
-
(g). The sum of the total effects of all genetic loci on the trait in an individual, that is g = Xß where X is a vector of genotypes for all loci and ß is a vector with additive allelic effects on the trait. It is also called the genotypic value, true polygenic (risk) score or breeding value.
- Covariance
-
(\({\sigma }_{x,y}\)). The expected product of the deviation of two random variables from their mean (\({\sigma }_{x,y}=E[(X-{\mu }_{x})(Y-{\mu }_{y})]\)).
- Genetic variance
-
(\({\sigma }_{g}^{2}\)). The expected squared deviation of genetic values from the mean genetic value (\({\sigma }_{g}^{2}=E[{(G-{\mu }_{g})}^{2}]\)), and can also be considered the covariance of a genetic value with itself.
- Heritability
-
(h2). The proportion of phenotypic variance (parameter \({\sigma }_{P}^{\,2}\), estimate VP) attributable to variance in genetic factors. In the context of human traits, most often only additive genetic factors are considered for the genetic variance (parameter \({\sigma }_{A}^{2}\), estimate VA) and the ratio of variances is the narrow-sense heritability.
- Latent model
-
A collection of formalized assumptions to describe a data-generating process through which observed variables (such as disease occurrence) can be used to identify unobserved (latent) variables (for example, genetic parameters: heritability and genetic correlation).
- Phenotypic variance
-
(\({\sigma }_{P}^{2}\)). Variance of phenotypic values (for example, height or disease liability) after accounting for the variance attributable to fixed effects (for example, sex). When phenotypes are standardized, these phenotypic values are scaled such that µP = 0 and \({\sigma }_{P}^{2}\) = 1.
- Coheritability
-
(hxy). The genetic covariance of standardized traits. This is a useful measure for comparisons of coheritabilities and heritabilities on the same scale.
- Linear mixed model
-
(LMM). A linear model that includes both fixed and random effects to describe phenotypic values and that allows a correlation structure between the random effect levels.
- Restricted maximum likelihood
-
(REML). A method for maximum likelihood estimation of variance–covariance components of the parameters in linear mixed models.
- Liability threshold model
-
A model that describes a dichotomous trait (disease) as a threshold partitioning of ‘liability’, which is a latent variable assumed to follow a standard normal distribution in the population. The liability threshold (T) defines lifetime risk (K) of disease as the proportion of individuals exceeding this threshold.
- Risk ratio
-
Ratio between the risk of disease in a specific group (for example, relatives of affected individuals) and the risk of disease in the general population.
- Tetrachoric correlation
-
The correlation between two latent normally distributed liability phenotypes assumed to underlie dichotomous population data and estimated from an observed 2 × 2 frequency table.
- Genomic relationship matrix
-
(GRM). A matrix whose off-diagonal elements represent a coefficient of genetic sharing between individuals to describe the variance–covariance structure between their genetic values calculated from observed single-nucleotide polymorphism (SNP) data. GRM coefficients can be calculated based on different assumptions of the expected distribution of per-SNP heritability.
- SNP-based heritability
-
An estimate of the proportion of the total phenotypic variance attributable to the additive effects of the class of variants (that is, common single-nucleotide polymorphisms (SNPs)) that are typically genotyped and imputed in pursuit of a genome-wide association study. It is often shortened to SNP heritability, but this should be avoided.
- Genotype by environment (G × E) interaction
-
Differences in size and/or direction of the effect of genotype on disease risk in two different environments.
- Sample heterogeneity
-
Differences in the effects of genotype on disease risk in two different cohorts. Potential causes include differences in phenotype criteria, ascertainment methods and unknown environmental differences with genotype by environment interaction.
- Infinitesimal model
-
This model assumes that a trait is shaped by a very large number of variants with small (infinitesimal) effects resulting in a normally distributed phenotype. A polygenic architecture of >~10 causal variants is approximated well by normal distribution infinitesimal model theory.
- Haseman–Elston regression
-
Regression of the product of the standardized phenotypes of pairs of individuals on their coefficient of genetic sharing as defined in the genomic relationship matrix.
- Confounding bias
-
A type of bias that emerges when a covariate, a ‘confounder’, causally influences the predictor variable and outcome variable. When the confounder is not accounted for, the relationship between predictor and outcome may be biased (confounded).
- Assortative mating
-
Mating selection on a trait where the phenotypes of mates are positively correlated. Examples of assortative mating in humans include height or educational attainment.
- Collider bias
-
A type of bias that emerges when estimates are conditioned on a covariate, a ‘collider,’ that is causally influenced by both the predictor variable and outcome variable.
Rights and permissions
About this article
Cite this article
van Rheenen, W., Peyrot, W.J., Schork, A.J. et al. Genetic correlations of polygenic disease traits: from theory to practice. Nat Rev Genet 20, 567–581 (2019). https://doi.org/10.1038/s41576-019-0137-z
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41576-019-0137-z
This article is cited by
-
The goldmine of GWAS summary statistics: a systematic review of methods and tools
BioData Mining (2024)
-
Gene therapy for polygenic or complex diseases
Biomarker Research (2024)
-
Pleiotropy, epistasis and the genetic architecture of quantitative traits
Nature Reviews Genetics (2024)
-
Heritability of functional gradients in the human subcortico-cortical connectivity
Communications Biology (2024)
-
Uncovering the heritable components of multimorbidities and disease trajectories using a nationwide cohort
Nature Communications (2024)