Most human diseases are dichotomous and are measured on a binary scale (disease absent (0) or present (1)). Some of the observed phenotypic variation of disease can be attributed to genetic variation.
Heritability is the ratio of the genetic variation to the phenotypic variation. Its estimates are specific to the population, disease and circumstances on which it is estimated.
Methods to estimate heritability for continuous traits do not directly apply to disease, and heritability is often estimated on an assumed normally distributed liability that underlies disease. This is called the heritability of liability to disease (hx2) and should be distinguished from the heritability of disease in the observed scale or disease itself (h0/12).
Methods of estimation based on mixed linear models have the ability to exploit data composed of various relatives and are the recommended methods of estimation in practice.
Difficulties of estimation, potential biases and a lack of consistent interpretation have made heritability a controversial summary statistic of familial aggregation of disease.
Despite its caveats, heritability is the single most useful measure of familial aggregation of disease. The heritability captures information from multiple relatives and can be interpreted in a wider context than competing measures such as the sibling relative risk, which is useful only in the context of siblings. Moreover, unlike other measures of familial aggregation, it attempts to separate environmental and genetic sources of familial correlation.
The main sources of bias in heritability estimates are common environmental factors, genotype-by-environment interactions, disease diagnosis and ascertainment andthe change of scale from the observed to the liability scale when h0/12 is estimated.
Heritability estimates are useful because they set limits to the contribution of genetic factors to variation of disease; however, identifying genetic and environmental sources of familial covariance should remain the primary aim of future research.
Relatives provide the basic material for the study of inheritance of human disease. However, the methodologies for the estimation of heritability and the interpretation of the results have been controversial. The debate arises from the plethora of methods used, the validity of the methodological assumptions and the inconsistent and sometimes erroneous genetic interpretations made. We will discuss how to estimate disease heritability, how to interpret it, how biases in heritability estimates arise and how heritability relates to other measures of familial disease aggregation.
This is a preview of subscription content, access via your institution
Open Access articles citing this article.
Arthritis Research & Therapy Open Access 06 August 2022
BMC Genomic Data Open Access 27 May 2021
Scientific Reports Open Access 01 September 2020
Subscribe to Journal
Get full journal access for 1 year
only $8.25 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
Visscher, P. M., Hill, W. G. & Wray, N. R. Heritability in the genomics era—concepts and misconceptions. Nature Rev. Genet. 9, 255–266 (2008). This Review paper is a clear and concise introduction to the concept of heritability.
Wray, N. R., Yang, J., Goddard, M. E. & Visscher, P. M. The genetic interpretation of area under the ROC curve in genomic profiling. PLoS Genet. 6, e1000864 (2010).
Smith, C. Recurrence risks for multifactorial inheritance. Am. J. Hum. Genet. 23, 578–588 (1971). This paper shows the correspondence of Falconer's abrupt liability model with a normal genetic distribution of liability with a cumulative normal risk function.
Gianola, D. Theory and analysis of threshold characters. J. Animal Sci. 54, 1079–1096 (1982).
Lush, J. L., Lamoreux, W. F. & Hazel, L. N. The heritability of resistance to death in the fowl. Poultry Sci. 27, 375–388 (1948). The authors of this paper suggest, for the first time, to transform the heritability estimates obtained on the 0/1 scale ( h 0/12) to estimates on the liability scale ( h x2).
Robertson, A. & Lerner, I. M. The heritability of all-or-none traits — viability of poultry. Genetics 34, 395–411 (1949).
Boomsma, D., Busjahn, A. & Peltonen, L. Classical twin studies and beyond. Nature Rev. Genet. 3, 872–882 (2002).
Edwards, J. H. Familial predisposition in man. Br. Med. Bull. 25, 58–64 (1969).
Falconer, D. S. Inheritance of liability to certain diseases estimated from incidence among relatives. Ann. Hum. Genet. 29, 51–76 (1965). This article describes how to transform prevalence information among relatives into an estimate of correlation and hence heritability.
Eisenhart, C. The assumptions underlying the analysis of variance. Biometrics 3, 1–21 (1947).
Hopper, J. L. Variance components for statistical genetics: applications in medical research to characteristics related to human diseases and health. Statist. Methods Med. Res. 2, 199–223 (1993).
Smith, C., Falconer, D. S. & Duncan, L. J. P. Statistical and genetic study of diabetes: II. Heritability of liability. Ann. Hum. Genet. 35, 281–299 (1972).
Rose, S. P. R. Commentary: heritability estimates — long past their sell-by date. Int. J. Epidemiol. 35, 525–527 (2006).
Lewontin, R. C. Annotation: the analysis of variance and the analysis of causes. Am. J. Hum. Genet. 26, 400–411 (1974).
Maher, B. Personal genomes: the case of the missing heritability. Nature 456, 18–21 (2008).
Slatkin, M. Epigenetic inheritance and the missing heritability problem. Genetics 182, 845–850 (2009).
Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nature Rev. Genet. 11, 446–450 (2010).
Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–753 (2009).
Falconer, D. S. & Mackay, T. F. C. Introduction to Quantitative Genetics (Longman, 1996). This is a classic introductory book on quantitative genetics for anyone new to the field.
Lynch, M. & Walsh, B. Genetics and Analysis of Quantitative Traits (Sinauer, 1998). This is a comprehensive and more advanced book on quantitative genetics than Reference 19. Volume 2 is available at the author's Web page.
Elston, R. C. & Rao, D. C. Statistical modeling and analysis in human genetics. Annu. Rev. Biophys. Bioengineer. 7, 253–286 (1978).
Sham, P. Statistics in Human Genetics (Arnold, 1998). This is an excellent reference book. Concepts of statistical genetics are clearly explained in the context of human genetics.
Pearson, K. & Lee, A. Mathematical contributions to the theory of evolution VII — on the application of certain formulae in the theory of correlation to the inheritance of characters not capable of quantitative measurement. Proc. R. Soc. 66, 324–327 (1900).
Dempster, E. R. & Lerner, I. M. Heritability of threshold characters. Genetics 35, 212–236 (1950). This paper and its appendix, by A. Robertson, show the relationship between the heritability estimates obtained in observed and liability scales.
Allen, G., Harvald, B. & Shields, J. Measures of twin concordance. Acta Genet. Stat. Med. 17, 475–481 (1967).
Trimble, B. K. An Empirical Simulation of Quasi-Continuous Inheritance Using Human Birthweight Data. Thesis, McGill Univ. (1971).
Falconer, D. S. Inheritance of liability to diseases with variable age of onset with particular reference to diabetes mellitus. Ann. Hum. Genet. 31, 1–20 (1967).
Smith, C. Heritability of liability and concordance in monozygous twins. Ann. Hum. Genet. 34, 85–91 (1970).
Hall, J. M. et al. Linkage of early-onset familial breast cancer to chromosome 17q21. Science 250, 1684–1689 (1990).
Reich, T., Morris, C. A. & James, J. W. Use of multiple thresholds in determining mode of transmission of semi-continuous traits. Ann. Hum. Genet. 36, 163–168 (1972).
Thompson, R. Maximum likelihood approach to estimate of liability. Ann. Hum. Genet. 36, 221–231 (1972).
Gilmour, A. R., Anderson, R. D. & Rae, A. L. The analysis of binomial data by a generalized linear mixed model. Biometrika 72, 593–599 (1985). This paper describes a method that fits mixed linear models to binomial data and that allows estimation of the variances directly on the liability scale.
Nelder, J. A. & Wedderburn, R. W. M. Generalized linear models. J. R. Stat. Soc. A 135, 370–384 (1972).
Harville, D. A. & Mee, R. W. A. Mixed-model procedure for analyzing ordered categorical-data. Biometrics 40, 393–408 (1984).
Visscher, P. M., Haley, C. S., Heath, S. C., Muir, W. J. & Blackwood, D. H. Detecting QTLs for uni- and bipolar disorder using a variance component method. Psychiatr. Genet. 9, 75–84 (1999).
Visscher, P. M., Haley, C. S. & Knott, S. A. Mapping QTLs for binary traits in backcross and F-2 populations. Genet. Res. 68, 55–63 (1996).
Burton, P. R. et al. Genetic variance components analysis for binary phenotypes using generalized linear mixed models (GLMMs) and Gibbs sampling. Genet. Epidemiol. 17, 118–140 (1999).
Sorensen, D. & Gianola, D. Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics (Springer, 2004).
Visscher, P. M. A note on the asymptotic distribution of likelihood ratio tests to test variance components. Twin Res. Hum. Genet. 9, 490–495 (2006).
Martin, N., Boomsma, D. & Machin, G. A twin-pronged attack on complex traits. Nature Genet. 17, 387–392 (1997).
Posthuma, D. & Boomsma, D. I. Mx scripts library: structural equation modeling scripts for twin and family data. Behav. Genet. 35, 499–505 (2005).
Neale, M. C. & Cardon, L. R. Methodology for Genetic Studies of Twins and Families (Kluwer Academic Publishers, 1992).
Kruuk, L. E. Estimating genetic parameters in natural populations using the “animal model”. Phil. Trans. R. Soc. Lond. B 359, 873–890 (2004).
Patterson, H. D. & Thompson, R. Recovery of inter-block information when block sizes are unequal. Biometrika 58, 545–554 (1971). This is a classic and advanced paper that describes the method of restricted maximum likelihood estimation.
Stram, D. O. & Lee, J. W. Variance-components testing in the longitudinal mixed effects model. Biometrics 50, 1171–1177 (1994).
Self, S. G. & Liang, K. Y. Asymptotic properties of maximum-likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Am. Statist. Assoc. 82, 605–610 (1987).
Akaike, H. A new look at statistical model identification. IEEE Trans. Automat. Contr. 19, 716–723 (1974).
Law, C. M. & Shiell, A. W. Is blood pressure inversely related to birth weight? The strength of evidence from a systematic review of the literature. J. Hypertension 14, 935–941 (1996).
Godfrey, K. M. & Barker, D. J. Fetal programming and adult health. Publ. Health Nutr. 4, 611–624 (2001).
Keita, S. O., Payne, P., Pascalev, A. K. & Roya, C. Abstract 37 in 'Abstracts of the 32nd Annual Meeting of the Human Biology Association Philadelphia, Pennsylvania March 28–29, 2007'. Am. J. Hum. Biol. 19, 261–262 (2007).
Wells, J. C. K. The thrifty phenotype as an adaptive maternal effect. Biol. Rev. 82, 143–172 (2007).
Kendler, K. S., Neale, M. C., Kessler, R. C., Heath, A. C. & Eaves, L. J. A test of the equal-environment assumption in twin studies of psychiatric illness. Behav. Genet. 23, 21–27 (1993).
Hopper, J. Genes for osteoarthritis: interpreting twin data — commentary. Br. Med. J. 312, 943–944 (1996).
Christian, J. C. et al. Variance of plasma free and esterified cholesterol in adult twins. Am. J. Hum. Genet. 28, 174–178 (1976).
Reed, T., Uchida, I. A., Norton, J. A. Jr & Christian, J. C. Comparisons of dermatoglyphic patterns in monochorionic and dichorionic monozygotic twins. Am. J. Hum. Genet. 30, 383–391 (1978).
Kaminsky, Z. A. et al. DNA methylation profiles in monozygotic and dizygotic twins. Nature Genet. 41, 240–245 (2009).
Scarr, S. & Mccartney, K. How people make their own environments — a theory of genotype-environment effects. Child Dev. 54, 424–435 (1983).
Cavalli-Sforza, L. L. & Feldman, M. W. Cultural versus biological inheritance: phenotypic transmission from parents to children. (A theory of effect of parental phenotypes on children's phenotypes). Am. J. Hum. Genet. 25, 618–637 (1973).
Lathrope, G. M., Lalouel, J. M. & Jacquard, A. Path analysis of family resemblance and gene-environment interaction. Biometrics 40, 611–625 (1984).
Rao, D. C. & Morton, N. E. Path analysis of family resemblance in presence of gene-environment interaction. Am. J. Hum. Genet. 26, 767–772 (1974).
Vanvleck, L. D. Estimation of heritability of threshold characters. J. Dairy Sci. 55, 218–225 (1972).
Hrubec, Z. Effect of diagnostic ascertainment in twins on assessment of genetic factor in disease etiology. Am. J. Hum. Genet. 25, 15–28 (1973).
Smith, C. Concordance in twins — methods and interpretation. Am. J. Hum. Genet. 26, 454–466 (1974).
Visscher, P. M. et al. Genome partitioning of genetic variation for height from 11,214 sibling pairs. Am. J. Hum. Genet. 81, 1104–1110 (2007).
Ritland, K. Marker-based method for inferences about quantitative inheritance in natural populations. Evolution 50, 1062–1073 (1996). Ritland was the first to propose the use genetic markers to estimate relationships and to use these estimates to estimate heritability.
Ritland, K. & Ritland, C. Inferences about quantitative inheritance based on natural population structure in the yellow monkeyflower, Mimulus guttatus. Evolution 50, 1074–1082 (1996).
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 42, 565–569 (2010). This paper shows that SNP arrays can be used to estimate distant relationships among individuals considered to be unrelated and these used for REML estimation.
Visscher, P. M., Yang, J. & Goddard, M. E. A commentary on 'Common SNPs explain a large proportion of the heritability for human height' by Yang et al. (2010). Twin Res. Hum. Genet. 13, 517–524 (2010).
Breitling, L. P., Yang, R. X., Korn, B., Burwinkel, B. & Brenner, H. Tobacco-smoking-related differential DNA methylation: 27K discovery and replication. Am. J. Hum. Genet. 88, 450–457 (2011).
Bishop, S. C. & Woolliams, J. A. On the genetic interpretation of disease data. PLoS ONE 5, e8940 (2010).
Price, B. Primary biases in twin studies — a review of prenatal and natal difference-producing factors in monozygotic pairs. Am. J. Hum. Genet. 2, 293–352 (1950).
Bundey, S. Uses and limitations of twin studies. J. Neurol. 238, 360–364 (1991).
Veerman, J. L. On the futility of screening for genes that make you fat. PLoS Med. 8, e1001114 (2011).
Maes, H. H. M., Neale, M. C. & Eaves, L. J. Genetic and environmental factors in relative body weight and human adiposity. Behav. Genet. 27, 325–351 (1997).
Wardle, J., Carnell, S., Haworth, C. M. A. & Plomin, R. Evidence for a strong genetic influence on childhood adiposity despite the force of the obesogenic environment. Am. J. Clin. Nutr. 87, 398–404 (2008).
Musani, S. K., Erickson, S. & Allison, D. B. Obesity — still highly heritable after all these years. Am. J. Clin. Nutr. 87, 275–276 (2008).
Lichtenstein, P. et al. Environmental and heritable factors in the causation of cancer — analyses of cohorts of twins from Sweden, Denmark, and Finland. New Engl. J. Med. 343, 78–85 (2000).
Willer, C. J., Dyment, D. A., Risch, N. J., Sadovnick, A. D. & Ebers, G. C. Twin concordance and sibling recurrence rates in multiple sclerosis. Proc. Natl Acad. Sci. USA 100, 12877–12882 (2003).
Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
Lee, S. H. et al. Estimating the proportion of variation in susceptibility to schizophrenia captured by common SNPs. Nature Genet. 44, 247–250 (2012).
Pearce, N. Epidemiology in a changing world: variation, causation and ubiquitous risk factors. Int. J. Epidemiol. 40, 503–512 (2011).
Kidd, K. K. & Cavalli-Sforza, L. L. An analysis of the genetics of schizophrenia. Biodemography Soc. Biol. 20, 254–265 (1973).
Slatkin, M. Exchangeable models of complex inherited diseases. Genetics 179, 2253–2261 (2008).
Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 1193–1198 (2012).
Rowe, S. J. & Tenesa, A. Human complex trait genetics: lifting the lid of the genomics toolbox — from pathways to prediction. Curr. Genom. 13, 213–224 (2012).
Curnow, R. N. The multifactorial model for the inheritance of liability to disease and its implications for relatives at risk. Biometrics 28, 931–946 (1972).
Peakman, T. C. & Elliott, P. The UK Biobank sample handling and storage validation studies. Int. J. Epidemiol. 37, 2–6 (2008).
This work was supported by Cancer Research UK (C12229/A13154) and the UK Biotechnology and Biological Sciences Research Council (BB/K000195/1). We acknowledge the financial support provided by the MRC–HGU Core Fund and the Roslin Institute through its Strategic Programme Grant. We thank W. G. Hill for helpful comments on an earlier version of the manuscript.
The authors declare no competing financial interests.
- Tetrachoric correlation
An estimate of the correlation between two bivariate normal variables obtained from the 2 × 2 contingency table containing the counts of two categorical variables (for example, disease and non-disease in two types of relative).
- Mixed linear models
Statistical models used to analyse grouped data. Grouped data, such as repeated measurements, generate within-group correlations that need to be accounted for to make correct inferences. In the context of heritability estimation, they separate fixed effects (for example, gender or age) from random effects (individuals).
- Bayesian methods
Bayesian methods of inference combine prior beliefs about a hypothesis with the information provided by the available data to modify those prior beliefs. The stronger the prior beliefs are, the more data will be required to modify them. Bayesian methods could help inference that is based on small sample sizes, where maximum likelihood methods may fail.
- Maximum likelihood methods
Methods or techniques used for statistical inference. These methods are used for deriving functions of the sample (technically called estimators) that when applied to particular samples give estimates of the population parameters. The maximum likelihood estimates of the unknown parameters are the most likely parameters to have generated the observed data.
A population parameter (for example, a variance) is estimated from a random population sample using an estimator (for example, a formula). An estimator is unbiased if the mean of the estimates it produces over many samples, regardless of their size, is the population parameter.
- Additive genetic values
Also called breeding values, these are defined as the sum of the average effects of the alleles an individual carries and, in the context of disease, as the average disease risk a person will confer to their children. Both definitions are equivalent only when there is no interaction between loci.
- Genotypic values
For a given genotype, these values are the expected phenotypes that arise from the combined expression of all of the genes contributing to the trait. In the context of disease, in the observed scale is the penetrance (that is, the probability of disease given the genotype).
The process of identifying cases of disease in the population. Ascertainment and sampling are often used synonymously, especially when talking of ascertainment or sampling bias.
- Index case
Also called proband; Falconer used the term propositi to refer to the probands. This is the patient within a family who is first recruited to the study. Because other relatives are actively recruited as a consequence of the index case recruitment, the families are not a representative sample of the general population.
- Proband concordance rate
(qc). Defined as the proportion of twins with the disease of interest among twins who are independently ascertained. That is, each twin pair is counted once for each twin independently brought to the study. It can be computed as qtwin = 2n11/(2n11 + n10).
- Nested models
Two statistical models are nested if both models contain the same terms and one model has at least one additional term. The model with the larger number of terms is the full model, and the other is the reduced model. For instance, model P = A + C + E is nested within P = A + D + C + E. Models P = A + C + E and P = A + D + E are non-nested.
- Likelihood ratio test
(LRT). Used to compare how well a model (full model) and a subset of that model (reduced model) fit the data. It is calculated as LRT = −2ln(LReduced/LFull) and distributed as χr2 where r is the difference of parameters fitted in the two models.
- Akaike information criterion
An approach used to compare non-nested models. The Akaike information criterion penalizes complicated models by adding two times the number of fitted parameters to twice the negative value of the maximum likelihood. The model with the smallest Akaike information criterion is chosen as the most parsimonious.
Two alleles are identical-by-descent if they are a copy of the same allele carried in an ancestral individual.
- Realized identity-by-descent
Actual, as opposed to expected, identity-by-descent sharing between pairs of individuals as estimated from their genotypes. It accounts for the deviations from the expected identity-by-descent values that arise from the random segregation of alleles.
- Population attributable risk
Also called the population attributable fraction. For a given disease, risk factor and population, the population attributable risk for the population incidence rate is the fraction by which the incidence rate of the disease in the population would be reduced if the risk factor was eliminated.
- Sequence-wide association studies
The extension of array-based genome-wide association studies to whole-genome sequence-based association studies.
About this article
Cite this article
Tenesa, A., Haley, C. The heritability of human disease: estimation, uses and abuses. Nat Rev Genet 14, 139–149 (2013). https://doi.org/10.1038/nrg3377
This article is cited by
Arthritis Research & Therapy (2022)
BMC Genomic Data (2021)
Nature Reviews Genetics (2020)
Scientific Reports (2020)