Review Article | Published:

Relatedness in the post-genomic era: is it still useful?

Nature Reviews Genetics volume 16, pages 3344 (2015) | Download Citation

Abstract

Relatedness is a fundamental concept in genetics but is surprisingly hard to define in a rigorous yet useful way. Traditional relatedness coefficients specify expected genome sharing between individuals in pedigrees, but actual genome sharing can differ considerably from these expected values, which in any case vary according to the pedigree that happens to be available. Nowadays, we can measure genome sharing directly from genome-wide single-nucleotide polymorphism (SNP) data; however, there are many such measures in current use, and we lack good criteria for choosing among them. Here, we review SNP-based measures of relatedness and criteria for comparing them. We discuss how useful pedigree-based concepts remain today and highlight opportunities for further advances in quantitative genetics, with a focus on heritability estimation and phenotype prediction.

Key points

  • Relatedness is a fundamental concept in everyday life and in quantitative genetics. It has a central role in efforts to understand genetic mechanisms and in predicting phenotypes, as well as in population, evolutionary and forensic genetics.

  • Traditionally, the relatedness of two individuals was measured in terms of the fraction of genome they share IBD (identity-by-descent), which is defined as inheritance from a recent common ancestor, but there are many approaches to interpreting 'recent'.

  • A better viewpoint is given by coalescent theory: the time since the most recent common ancestor for two individuals varies along the genome and can take an essentially continuous range of possible values.

  • There are now many different ways to measure the genetic similarity between pairs of individuals using genome-wide single-nucleotide polymorphism (SNP) data. The binary IBD versus non-IBD distinction provides a simple approximation but gives an inadequate representation of reality compared with the precision offered by the extensive data sets available nowadays.

  • We argue that, for many applications, traditional concepts of relatedness are no longer required; instead, models and analyses can be based directly on genome similarity.

  • There is no one best measure of genome similarity, but different measures can be evaluated on their performance in specific applications.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    A geometric view of relatedness. Oxford Surv. Evol. Biol. 2, 28–90 (1985).

  2. 2.

    Evolutionary Genetics (Oxford Univ. Press, 1998).

  3. 3.

    Inbreeding and relatedness coefficients: what do they measure? Heredity 88, 371–380 (2002). This paper gives a critical examination of kinship coefficients and proposes a new approach to measure kinship based on a cumulative excess of recent coalescences.

  4. 4.

    , & Reconciling the analysis of IBD and IBS in complex trait studies. Nature Rev. Genet. 11, 800–805 (2010). This is a review on IBS and IBD concepts, with a focus on choice of reference population; it also discusses SNP-based computation of relatedness coefficients and their use in heritability estimation.

  5. 5.

    , & Genetic relatedness analysis: modern data and new challenges. Nature Rev. Genet. 7, 771–780 (2006).

  6. 6.

    et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099–1103 (2010).

  7. 7.

    & The case for parity and birth-order statistics. Aust. N. Z. J. Stat. 48, 171–200 (2006).

  8. 8.

    , , & Population structure and inbreeding from pedigree analysis of purebred dogs. Genetics 179, 593–601 (2008).

  9. 9.

    Identity by descent: variation in meiosis, across genomes, and in populations. Genetics 194, 301–326 (2013). This is an extensive review on the IBD concept that covers many applications and citations to early literature. We disagree with the conceptual framework, but there is much that is valuable in this review.

  10. 10.

    et al. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. PLoS Genet. 2, e41 (2006). This paper introduces a clever innovation for heritability estimation and is the first to exploit differences in realized IBD among pairs of individuals with the same pedigree-based relatedness.

  11. 11.

    On estimation of genetic variance within families using genome-wide identity-by-descent sharing. Genet. Sel. Evol. 45, 32 (2013).

  12. 12.

    et al. Common SNPs explain a large proportion of the heritability for human height. Nature Genet. 42, 565–569 (2010).

  13. 13.

    & Introduction to Quantitative Genetics 4th edn (Longman, 1996).

  14. 14.

    The probability that related individuals share some section of genome identical by descent. Theor. Popul. Biol. 23, 34–63 (1983).

  15. 15.

    et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet. 40, 1068–1075 (2008).

  16. 16.

    & An Introduction to Population Genetics Theory (Harper and Row, 1970).

  17. 17.

    The genetical structure of populations. Ann. Eugen. 15, 159–171 (1951).

  18. 18.

    Coefficients of inbreeding and relationship. Amer. Nat. 61, 330–338 (1922).

  19. 19.

    et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).

  20. 20.

    et al. Performance of marker-based relatedness estimators in natural populations of outbred vertebrates. Genetics 173, 2091–2101 (2006).

  21. 21.

    , , & Estimating relatedness between individuals in general populations with a focus on their use in conservation programs. Genetics 173, 483–496 (2006).

  22. 22.

    in Handbook of Statistical Genetics (eds Balding, D., Bishop, M. & Cannings, C.) Ch. 30 (Wiley, 2007).

  23. 23.

    The estimation of pairwise relationships. Ann. Hum. Genet. 39, 173–188 (1975).

  24. 24.

    et al. On the use of large marker panels to estimate inbreeding and relatedness: empirical and simulation studies of a pedigreed zebra finch population typed at 771 SNPs. Mol. Ecol. 19, 1439–1451 (2010).

  25. 25.

    et al. Improved estimation of inbreeding and kinship in pigs using optimized SNP panels. BMC Genet. 14, 92 (2013).

  26. 26.

    in Handbook of Statistical Genetics (eds Balding, D., Bishop, M. & Cannings, C.) Ch. 25 (Wiley, 2007).

  27. 27.

    et al. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488, 471–475 (2012).

  28. 28.

    & Population structure and cryptic relatedness in genetic association studies. Statist. Sci. 24, 451–471 (2009).

  29. 29.

    The Mathematics of Heredity (Freeman, 1969).

  30. 30.

    Linkage disequilibrium and homozygosity of chromosome segments in finite populations. Theoretical Popul. Biol. 2, 125–141 (1971).

  31. 31.

    , , & Novel multilocus measure of linkage disequilibrium to estimate past effective population size. Genome Res. 13, 635–643 (2003).

  32. 32.

    & Population identification using genetic data. Annu. Rev. Genet. 13, 337–361 (2012). This is a review on available GSMs, both that do and do not take account of linkage, from the perspective of classifying individuals into populations.

  33. 33.

    , , & Variation in estimated recombination rates across human populations. Hum. Genet. 122, 301–310 (2007).

  34. 34.

    et al. Recombination rates in admixed individuals identified by ancestry-based inference. Nature Genet. 43, 847–853 (2011).

  35. 35.

    & The geography of recent genetic ancestry across europe. PLoS Biol. 11, e1001555 (2013). This paper investigates IBD genome sharing across Europe and how this reflects population size and migrations over recent millennia.

  36. 36.

    , & Different genomic relationship matrices for single-step analysis using phenotypic, pedigree and genomic information. Genet. Sel. Evol. 43, 1–7 (2011).

  37. 37.

    , & GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).

  38. 38.

    et al. Estimation of coancestry in Iberian pigs using molecular markers. Conserv. Genet. 3, 309–320 (2002).

  39. 39.

    et al. PLINK: a toolset for whole-genome association and population-based linkage analysis. Am. J. Hum. Genet. 81, 559–575 (2007).

  40. 40.

    , & The impact of genetic relationship information on genome-assisted breeding values. Genetics 177, 2389–2397 (2007).

  41. 41.

    Efficient methods to compute genomic predictions. J. Dairy Sci. 91, 4414–4423 (2008).

  42. 42.

    et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nature Genet. 38, 203–208 (2006).

  43. 43.

    , , & Spatial genetic structure of a tropical understory shrub Psychotria officinalis (Rubiaceae). Am. J. Bot. 82, 1420–1425 (1995).

  44. 44.

    , & A genomic background based method for association analysis in related individuals. PLoS ONE 2, e1274 (2007).

  45. 45.

    et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006).

  46. 46.

    , , & Improved heritability estimation from genome-wide SNP data. Am. J. Hum. Genet. 91, 1011–1021 (2012).

  47. 47.

    & A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78, 629–644 (2006).

  48. 48.

    , , , & Mach: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34, 816–834 (2010).

  49. 49.

    & A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

  50. 50.

    , & Genotype imputation with thousands of genomes. G3 1, 457–470 (2011).

  51. 51.

    , & Improved whole-chromosome phasing for disease and population genetic studies. Nature Methods 10, 5–6 (2013).

  52. 52.

    & Modeling linkage disequilibrium and identifying recombination hotspots using singlenucleotide polymorphism data. Genetics 165, 2213–2233 (2003).

  53. 53.

    The IBD process along four chromosomes. Theor. Popul. Biol. 73, 369–373 (2008).

  54. 54.

    et al. Whole population, genome-wide mapping of hidden relatedness. Genome Res. 19, 318–326 (2009).

  55. 55.

    , , & Estimating genome-wide IBD sharing from SNP data via an efficient hidden Markov model of LD with application to gene mapping. Bioinformatics 26, i175–i182 (2010).

  56. 56.

    , , , & A method for detecting IBD regions simultaneously in multiple individuals — with applications to disease genetics. Genome Res. 121, 1168–1180 (2011).

  57. 57.

    & A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet. 88, 173–182 (2011).

  58. 58.

    & Improving the accuracy and efficiency of identity-by-descent detection in population data. Genetics 194, 459–471 (2013).

  59. 59.

    et al. Relationship estimation from whole-genome sequence data. PLoS Genet. 10, e1004144 (2014).

  60. 60.

    , & Reducing pervasive false-positive identical-by-descent segments detected by large-scale pedigree analysis. Mol. Biol. Evol. 31, 2212–2222 (2014).

  61. 61.

    , & Inferring human colonization history using a copying model. PLoS Genet. 4, e1000078 (2008).

  62. 62.

    & Restricted maximum likelihood (REML) estimation of variance components in the mixed model. Technometrics 18, 31–38 (1976).

  63. 63.

    Estimation of genetic parameters. Ann. Math. Stat. 21, 309–310 (1950).

  64. 64.

    , , & The estimation of environmental and genetic trends from records subject to culling. Biometrics 15, 192–218 (1959).

  65. 65.

    , & A novel method for estimating heritability using molecular markers. Heredity 80, 218–224 (1998).

  66. 66.

    , , & Genomic selection in dairy cattle: progress and challenges. J. Dairy Sci. 92, 433–443 (2009).

  67. 67.

    & Genomic selection. J. Anim. Breed. Genet. 124, 323–330 (2007).

  68. 68.

    Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 139, 245–257 (2009).

  69. 69.

    , & Improving the efficiency of genomic selection. Stat. Appl. Genet. Mol. 12, 517–527 (2013).

  70. 70.

    et al. Beyond missing heritability: prediction of complex traits. PLoS Genet. 7, e1002051 (2011).

  71. 71.

    , , & Whole genome regression and prediction methods applied to plan and animal breeding. Genetics 193, 327–345 (2013).

  72. 72.

    et al. Statistical power to detect genetic (co)variance of complex traits using SNP data in unrelated samples. PLoS Genet. 10, e1004269 (2014).

  73. 73.

    , , , & Rare variants create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010).

  74. 74.

    et al. Genomic partitioning of genetic variation for complex traits using common SNPs. Nature Genet. 43, 519–525 (2011).

  75. 75.

    , & Predicting genetic predisposition in humans: the promise of whole-genome markers. Nature Rev. Genet. 11, 880–886 (2010).

  76. 76.

    Power and predictive accuracy of polygenic risk scores. PLoS Genet. 9, e03348 (2013).

  77. 77.

    et al. Pitfalls of predicting complex traits from SNPs. Nature Rev. Genet. 14, 507–515 (2013).

  78. 78.

    et al. Describing the genetic architecture of epilepsy through heritability analysis. Brain 137, 2680–2689 (2014).

  79. 79.

    The Wellcome Trust Case Control Consortium. Genome-wide association study of 14, 000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).

  80. 80.

    & Heritability of threshold characters. Genetics 35, 212–236 (1950).

  81. 81.

    et al. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nature Genet. 38, 879–887 (2006).

  82. 82.

    , & Polygeneic modeling with Bayesian sparse linear mixed models. PLoS. Genet. 9, e1003264 (2013).

  83. 83.

    & MultiBLUP: improved SNP-based prediction for complex traits. Genome Res. 24, 1550–1557 (2014).

  84. 84.

    et al. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics. Sci. Rep. 3, 1815 (2013).

  85. 85.

    Higher order probability functions of identity of alleles by descent. Genetics 69, 235–246 (1971).

  86. 86.

    & in Handbook of Statistical Genetics (eds Balding, D., Bishop, M. & Cannings, C.) Ch. 23 (Wiley, 2007).

  87. 87.

    The Genetic Structure of Populations (Springer, 1974).

  88. 88.

    in Genetics and Social Structure (ed. Ballonoff, P. A.) 157–272 (Dowden, Hutchinson & Ross, 1974).

  89. 89.

    Variation in genetic identity among relatives. Hum. Hered. 46, 61–70 (1996).

  90. 90.

    & Variation in actual relationship among descendants of inbred individuals. Genet. Res. 94, 267–274 (2012).

  91. 91.

    The Theory of Inbreeding (Oliver and Boyd, 1949).

  92. 92.

    & Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).

  93. 93.

    Inbreeding coefficients and coalescence times. Genet. Res. 58, 167–175 (1991).

Download references

Acknowledgements

The authors thank G. Hellenthal and D. Kennett (both University College London), and M. Beaumont (University of Bristol) for discussion. This work is funded by the UK Medical Research Council under grant G0901388, with support from the National Institute for Health Research University College London Hospitals Biomedical Research Centre. Access to Wellcome Trust Case Control Consortium data was authorized as work related to the project “Genome-wide association study of susceptibility and clinical phenotypes in epilepsy”.

Author information

Author notes

    • David J. Balding

    Present address: Department of Genetics and Department of Mathematics and Statistics, University of Melbourne, Parkville VIC 3010, Australia.

Affiliations

  1. UCL Genetics Institute, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.

    • Doug Speed
    •  & David J. Balding

Authors

  1. Search for Doug Speed in:

  2. Search for David J. Balding in:

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to David J. Balding.

Supplementary information

PDF files

  1. 1.

    Supplementary information S1 (Box)

    Details of simulations

  2. 2.

    Supplementary information S2 (Box)

    DNA and pedigree ancestors

  3. 3.

    Supplementary information S3 (Box)

    Estimating effective population size

  4. 4.

    Supplementary information S4 (Box)

    Pedigrees generate genome-wide dependence among coalescent trees

  5. 5.

    Supplementary information S5 (Box)

    Expected IBD region length and expected number of regions shared IBD

  6. 6.

    Supplementary information S6 (Box)

    Probability that IBD implies IBS

  7. 7.

    Supplementary information S7 (Box)

    Mixed model likelihood

  8. 8.

    Supplementary information S8 (Box)

    Inferring the power parameter α

Glossary

Relatedness

Two individuals are related if they have a recent common ancestor, where 'recent' can be variously defined as outlined under IBD (identity-by-descent).

IBD

(Identity-by-descent; also identical-by-descent). The phenomenon whereby two individuals share a genomic region as a result of inheritance from a recent common ancestor, where 'recent' can mean from an ancestor in a given pedigree, or with no intervening mutation event or with no intervening recombination event.

Pedigree

A set of individuals connected by parent–child relationships.

Most recent common ancestor

(MRCA). Although the ancestries of two alleles may both pass through the same individual, they pass through different alleles with probability 0.5, in which case that individual is not the MRCA of the alleles.

Time since the MRCA

(TMRCA; in generations). If the times back to a common ancestor differ between two individuals, then the average is used.

Heritability

The proportion of phenotypic variation that can be attributed to any genetic variation (broad-sense heritability) or to additive genetic variation (narrow-sense heritability (h2)).

Lineage paths

Sequences of parent–child steps linking individuals with length equal to the number of steps.

Coancestry

(θ). A kinship coefficient defined as the probability that two homologous alleles, one drawn from each of two individuals, are IBD (identical-by-descent).

Inbreeding coefficients

The coancestries of the two parents of an individual.

Maximum likelihood estimators

Estimates of unknown parameters obtained by maximizing the likelihood for the observed data given a statistical model.

Method of moments estimators

Estimates of unknown parameters obtained by equating theoretical moments (for example, mean, variance and skewness) under the assumed statistical model to empirical moments calculated from the observed data.

Coalescent tree

Each leaf of the tree corresponds to an observed allele, and the root represents the most recent common ancestor (MRCA) of all observed alleles. The internal nodes (branching points) represent the MRCA of the alleles at the leaves connected to that node (without passing the root). Distances along branches represent time, measured in generations.

IBS

(Identical-by-state; also identity-by-state). When two homologous alleles have matching type. Some definitions of IBS exclude IBD (identity-by-descent).

Linkage disequilibrium

(LD). A population correlation of allele pairs drawn at different genomic loci in the same gamete (that is, in a haploid genome).

About this article

Publication history

Published

DOI

https://doi.org/10.1038/nrg3821

Further reading