Key Points
-
Population and quantitative genetics theory is built with parameters that describe relatedness, and estimation of these parameters from genetic markers enables progress in fields as disparate as plant breeding, human disease gene mapping and forensic science.
-
Relatedness can be described by the probabilities that two individuals share zero, one or two pairs of alleles that are identical-by-descent. More probabilities are needed if the individuals are inbred, meaning that their parents were related.
-
Alternative hypotheses about the relationship between two individuals can be evaluated by dividing the probability of the observed genotypes of the individuals under one hypothesis by the probability of the genotypes under the other. The ratio of probabilities is called the likelihood ratio. In paternity testing, it is called the paternity index.
-
The probabilities of patterns of identity-by-descent can be estimated by the method of maximum likelihood.
-
Even for individuals whose parents are not related, and who are therefore not inbred, account needs to be taken of 'background relatedness' that is due to evolutionary history in a population.
-
Even though the probabilities of identity-by-descent are defined by the family and population relatedness of two individuals, there is variation in actual identity-by-descent along the genome. This reflects the differences in actual genealogies at different loci, and it is influenced by recombination along with mutation and natural selection.
-
Relationship is best estimated by highly polymorphic markers, to minimize the ambiguity between identity-in-state and identity-by-descent. However, reliable estimates can be obtained with a sufficiently large number of biallelic SNPs.
Abstract
Individuals who belong to the same family or the same population are related because of their shared ancestry. Population and quantitative genetics theory is built with parameters that describe relatedness, and the estimation of these parameters from genetic markers enables progress in fields as disparate as plant breeding, human disease gene mapping and forensic science. The large number of multiallelic microsatellite loci and biallelic SNPs that are now available have markedly increased the precision with which relationships can be estimated, although they have also revealed unexpected levels of genomic heterogeneity of relationship measures.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Improved computations for relationship inference using low-coverage sequencing data
BMC Bioinformatics Open Access 09 March 2023
-
KIN: a method to infer relatedness from low-coverage ancient DNA
Genome Biology Open Access 17 January 2023
-
Pedigree reconstruction and genetic analysis of major ornamental characters of ornamental crabapple (Malus spp.) based on paternity analysis
Scientific Reports Open Access 18 August 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout




Change history
22 December 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41576-021-00445-6
References
Bowers, J. E. & Meredith, C. E. The parentage of a classic wine grape: Cabernet Sauvignon. Nature Genet.16, 84–87 (1997).
Jeffreys, A. J., Wilson, V. & Thein, S. L. Individual-specific fingerprints of human DNA. Nature316, 76–79 (1985).
Harris, D. L. Genotypic covariances between inbred relatives. Genetics50, 1319–1348 (1964). An early paper in which co-variances in trait values between individuals are expressed as functions of IBD probabilities.
Brenner, C. H. & Weir, B. S. Issues and strategies in the DNA identification of World Trade Center victims. Theor. Popul. Biol.63, 173–178 (2003). Describes the procedures that are used to identify victims from mass disasters using DNA from known relatives, with a focus on statistical, combinatorial and population genetic issues.
Wenk, R. E. & Chiafari, F. A. Distinguishing full siblings from half-siblings in limited pedigrees. Transfusion40, 44–47 (2000).
Gaytmenn, R., Hildebrand, D. P., Sweet, D. & Pretty, I. A. Determination of the sensitivity and specificity of sibship calculations using AmpF/STR Profiler Plus. Int. J. Legal Med.116, 161–164 (2002).
Reid, T. M. et al. Specificity of sibship determination using the ABI identifiler multiplex system. J. Forensic Sci.49, 1262–1264 (2004).
Tzeng, C. H. et al. Determination of sibship by PCR-amplified short tandem repeat analysis in Taiwan. Transfusion40, 840–845 (2000).
Bieber, F. R., Brenner, C. H. & Lazer, D. Finding criminals through DNA of their relatives. Science312, 1315–1316 (2006).
Olaisen, B., Stenersen, M. & Mevag, B. Identification by DNA analysis of the victims of the August 1996 Spitsbergen civil aircraft disaster. Nature Genet.15, 402–405 (1997).
Leclair, B., Fegeau, C. J., Bowen, K. L. & Fourney, R. M. Enhanced kinship analysis and STR-based DNA typing for human identification in mass fatality incidents: the Swissair Flight 111 disaster. J. Forensic Sci.49, 939–953 (2004).
Thompson, E. A. Estimation of pairwise relationships. Ann. Hum. Genet.39, 173–188 (1975). The classical treatment of maximum likelihood estimation of the three-parameter set of relatedness measures for non-inbred relatives.
Milligan, B. G. Maximum-likelihood estimation of relatedness. Genetics163, 1153–1167 (2003). An important demonstration of the superiority of maximum likelihood methods. Contains details on the implementation and performance of maximum likelihood relatedness estimation when the individuals who are being compared might be inbred.
Yu, J. et al. A unified mixed-model method for association mapping accounting for multiple levels of relatedness. Nature Genet.38, 203–208 (2006).
Liu, W. & Weir, B. S. Affected sib-pair tests in inbred populations. Ann. Hum. Genet.68, 606–619 (2004). The authors develop an analogue of a standard affected sib-pair test for linkage for use in inbred populations.
Balding, D. J. & Nichols, R. A. DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci. Int.64, 125–140 (1994). One of the early papers that deals with the calculation of genotype probabilities for individuals who are allowed to come from a subpopulation of the population from which the allele frequencies have been calculated.
Ewens, W. J. Mathematical Population Genetics. 1. Theoretical Introduction 2nd edn (Springer, New York, 2004). A useful reference that contains a treatment of the theory that underlies the calculation of genotypic probabilities for structured populations.
Hinds, D. A. et al. Whole-genome patterns of common DNA variation in three human populations. Science307, 1072–1079 (2005).
International HapMap Consortium. A haplotype map of the human genome. Nature437, 1299–1320 (2005).
Ayres, K. The expected performance of single nucleotide polymorphism loci in paternity testing. Forensic Sci. Int.154, 167–172 (2005).
Sobrino, B., Brion, M. & Carracedo, A. SNPs in forensic genetics: a review of SNP typing methodologies. Forensic Sci. Int.154, 181–194 (2005). Shows that when IBD is measured with respect to distant ancestry, IBD sharing between two individuals varies appreciably across the genome.
Gill, P. An assessment of the utility of single nucleotide polymorphisms (SNPs) for forensic purposes. Int. J. Legal Med.114, 204–210 (2001).
Amorim, A. & Pereira, L. Pros and cons in the use of SNPs in forensic kinship investigation: a comparative analysis with STRs. Forensic Sci. Int.150, 17–21 (2005).
Hepler, A. B. Improving Forensic Identification using Bayesian Networks and Relatedness Estimation. Ph.D. Thesis, North Carolina State Univ., Raleigh (2005).
Weir, B. S., Cardon, L., Anderson, A. D., Nielsen, D. M. & Hill, W. G. Heterogeneity of measures of population structure along the human genome. Genome Res.15, 1468–1476 (2005).
Ballantyne, J. Mass disaster genetics. Nature Genet.15, 329–331 (1997).
DeWoody, J. A. Molecular approaches to the study of parentage, relatedness, and fitness: practical applications for wild animals. J. Wildl. Manage.69, 1400–1418 (2005).
Williams, C. L., Serfass, T. L., Cogan, R. & Rhodes, O. E. Microsatellite variation in the reintroduced Pennsylvania elk herd. Mol. Ecol.11, 1299–1310 (2002).
Slager, S. L. & Schaid, D. J. Evaluation of candidate genes in case–control studies: a statistical method to account for related subjects. Am. J. Hum. Genet.68, 1457–1462 (2001).
Voight, B. F. & Pritchard, J. K. Confounding from cryptic relatedness in case–control association studies. PLoS Genet.1, 302–311 (2005). Demonstrates that unknown relatedness between supposedly unrelated cases or controls can lead to an increased false-positive rate in genetic association studies.
Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics55, 997–1004 (1999).
Merila, J. & Crnokrak, P. Comparison of genetic differentiation at marker loci and quantitative traits. J. Evol. Biol.14, 892–903 (2001).
Cockerham, C. C. Higher order probability functions of identity of alleles by descent. Genetics69, 235–246 (1971). Cockerham considers the sharing of IBD alleles between two individuals in terms of 15 IBD parameters, develops procedures for calculating the values of these parameters from pedigree data and examines their properties under various mating schemes.
Jacquard, A. Structures Génétiques des Populations (Masson & Cie, Paris, 1970); English translation available in Charlesworth, D. & Chalesworth, B. Genetics of Human Populations (Springer, New York, 1974). Considers relatedness in terms of nine IBD coefficients: these are now the most commonly used parameters for describing relatedness between two (possibly inbred) individuals.
Budowle, B. & Moretti, T. R. Genotype profiles for six population groups at the 13 CODIS short tandem repeat core loci and other PCR-based loci. US Department of Justice Forensic Science Communications [online], <http://www.fbi.gov/hq/lab/fsc/backissu/july1999/budowle.htm> (1999).
Acknowledgements
This work was supported in part by grants from the National Institutes of Health, the National Institute of Justice and the National Science Foundation. We are grateful to W.G. Hill and the reviewers for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Glossary
- Additive variance
-
The portion of the variance of a quantitative trait that is due to the single effects of alleles at the loci that influence the trait.
- Dominance variance
-
The portion of the variance of a quantitative trait that is due to the interaction of the two alleles that an individual carries at the loci that influence the trait.
- Affected-relative linkage studies
-
Studies that aim to estimate the degree of linkage between a disease and a marker locus on the basis of the marker genotypes of relatives who have the disease.
- Microsatellite
-
Also known as a short tandem repeat. A class of repetitive DNA that is made up of repeats that are 2–5 nucleotides in length. The number of these repeats is usually extremely variable in a population.
- Linkage disequilibrium
-
The non-random association of alleles at different loci, whether or not the loci are linked.
- Minisatellite
-
A region of DNA in which repeat units of 10–50 bp are tandemly arranged in arrays that are 0.5–30 kb in length.
- Association study
-
A study that aims to identify the joint occurrence of two genetically encoded characteristics in a population. Often, an association between a genetic marker and a phenotype (for example, a disease) is assessed.
- Inbreeding coefficient
-
The probability that an individual carries two identical-by-descent alleles at a locus.
- Coancestry coefficient
-
The probability that two alleles at a locus, one taken at random from two individuals, are identical-by-descent. It is also called the coefficient of parentage or coefficient of consanguinity.
- Unordered genotypes
-
The probability of unordered genotypes does not require specifying which genotype belongs to which individual (for example, which is for the parent and which is for the child). By contrast, the probability of ordered genotypes requires this information.
- Likelihood ratio
-
The ratio of two probabilities for the same observations, calculated under alternative hypotheses. In the context of relatedness analysis, the likelihood ratio is formed by dividing the probability of the observed pair of genotypes using the identical-by-descent probabilities for one possible relationship by the probability of the genotypes using identical-by-descent probabilities for the other possible relationship. The likelihood ratio is a continuous variable that can take any non-negative value, and values greater than one support the relationship used for the numerator.
- CODIS forensic set
-
A set of 13 highly polymorphic and essentially unlinked microsatellite markers that were developed by the US Federal Bureau of Investigations for human identification purposes.
- Bayesian (framework)
-
An inference framework in which the posterior probability of a parameter depends explicitly on its prior probability, reflecting some previous belief about this parameter.
- Maximum likelihood (method)
-
The process of estimating parameters by choosing their values to maximize the probability of some observed data.
- Bayes theorem
-
The means of going from a probability of one event, given another, to the probability of the second event, given the first. It is often used to express the (posterior) probability of a hypothesis, given some data, as being proportional to the probability of the data, given the hypothesis, multiplied by the (prior) probability of the hypothesis.
- Prior probability
-
The probability of an event or hypothesis before consideration of some data that will alter the probability of that event or hypothesis.
- Posterior probability
-
The probability of an event or hypothesis after consideration of some data that have altered the probability of that event or hypothesis.
- Population substructure
-
The existence of groups of individuals within a population that have some degree of reproductive isolation from the rest of the population, and for which the allele frequencies are likely to be different from the population as a whole.
- Kin selection
-
William D. Hamilton's theory to explain the evolution of the hallmark of social life: altruistic cooperation (carrying out functions that are costly to the individual but that benefit others). By helping a relative, an individual increases its fitness by increasing the number of copies of its genes in the population.
Rights and permissions
About this article
Cite this article
Weir, B., Anderson, A. & Hepler, A. Genetic relatedness analysis: modern data and new challenges. Nat Rev Genet 7, 771–780 (2006). https://doi.org/10.1038/nrg1960
Issue Date:
DOI: https://doi.org/10.1038/nrg1960
This article is cited by
-
Improved computations for relationship inference using low-coverage sequencing data
BMC Bioinformatics (2023)
-
KIN: a method to infer relatedness from low-coverage ancient DNA
Genome Biology (2023)
-
Schizophrenia in the genetic era: a review from development history, clinical features and genomic research approaches to insights of susceptibility genes
Metabolic Brain Disease (2023)
-
Pedigree reconstruction and genetic analysis of major ornamental characters of ornamental crabapple (Malus spp.) based on paternity analysis
Scientific Reports (2022)
-
Moment estimators of relatedness from low-depth whole-genome sequencing data
BMC Bioinformatics (2022)