Genetic polymorphism results from mutations along the branches of unknown genealogical trees, so genealogical models are needed for the analysis of polymorphism data.
The stochastic process known as 'the coalescent' has become the primary tool for modelling genealogies.
An extension of classical population-genetics models, the coalescent views lineages as randomly choosing parents going backwards in time.
The coalescent is a flexible model that accommodates phenomena such as recombination, age structure, geographical structure and population size change.
Efficient simulations and inference based on the coalescent allow tests about causes of genetic variation and estimation of demographic parameters, such as migration rates.
Unlike methods borrowed from phylogenetics that attempt to draw inferences from estimated genealogies, the coalescent treats genealogies as random and can naturally handle complex models that incorporate phenomena such as migration, selection and recombination.
In the age of genomic polymorphism data, coalescent-based methods will acquire greater roles in such areas as the inference of evolutionary history, the study of linkage disequilibrium in the human genome and the population genetics of infectious disease.
Improvements in genotyping technologies have led to the increased use of genetic polymorphism for inference about population phenomena, such as migration and selection. Such inference presents a challenge, because polymorphism data reflect a unique, complex, non-repeatable evolutionary history. Traditional analysis methods do not take this into account. A stochastic process known as the 'coalescent' presents a coherent statistical framework for analysis of genetic polymorphisms.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Luria, S. E. & Delbrück, M. Mutations of bacteria from virus sensitivity to virus resistance. Genetics 28, 491–511 (1943).
Chakravarti, A. Population genetics — making sense out of sequence. Nature Genet. 21, S56–S60 (1999).
Tavaré, S. Line-of-descent and genealogical processes, and their applications in population genetic models. Theor. Popul. Biol. 26, 119–164 (1984).
Hudson, R. R. in Oxford Surveys in Evolutionary Biology Vol. 7 (eds Futuyma, D. & Antonovics, J.) 1–43 (Oxford Univ. Press, Oxford, UK, 1990).
Donnelly, P. & Tavaré, S. Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29, 401–421 (1995).
Fu, Y.-X. & Li, W.-H. Coalescing into the 21st century: an overview and prospects of coalescent theory. Theor. Popul. Biol. 56, 1–10 (1999).
Nordborg, M. in Handbook of Statistical Genetics (eds Balding, D. J., Bishop, M. J. & Cannings, C.) 179–212 (John Wiley & Sons, Chichester, UK, 2001).
Stephens, M. in Handbook of Statistical Genetics (eds Balding, D. J., Bishop, M. J. & Cannings, C.) 213–238 (John Wiley & Sons, Chichester, UK, 2001).References 7 and 8 provide current technical reviews of the coalescent and its use in evolutionary inference.
Thompson, E. A. Statistical Inference from Genetic Data on Pedigrees (Institute of Mathematical Statistics, Beachwood, Ohio, 2000).
Wiuf, C. & Hein, J. Recombination as a point process along sequences. Theor. Popul. Biol. 55, 248–259 (1999).
Nordborg, M. & Tavaré, S. Linkage disequilibrium: what history has to tell us. Trends Genet. 18, 83–90 (2002).
Kingman, J. F. C. On the geneaology of large populations. J. Appl. Prob. 19A, 27–43 (1982).This paper provides the first description of the coalescent.
Hudson, R. R. Testing the constant-rate neutral allele model with protein sequence data. Evolution 37, 203–217 (1983).
Hudson, R. R. Properties of a neutral allele model with intragenic recombination. Theor. Popul. Biol. 23, 183–201 (1983).
Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics 105, 437–460 (1983).
Kingman, J. F. C. Origins of the coalescent: 1974–1982. Genetics 156, 1461–1463 (2000).
Griffiths, R. C. & Marjoram, P. Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3, 479–502 (1996).
Kaplan, N. L., Darden, T. & Hudson, R. R. The coalescent process in models with selection. Genetics 120, 819–829 (1988).
Neuhauser, C. & Krone, S. M. The genealogy of samples in models with selection. Genetics 145, 519–534 (1997).
Slatkin, M. Simulating genealogies of selected alleles in a population of variable size. Genet. Res. 78, 49–57 (2001).
Ewens, W. J. in Mathematical and Statistical Developments of Evolutionary Theory (ed. Lessard, S.) 177–227 (Kluwer Academic, Dordrecht, 1990).
Felsenstein, J. in Evolutionary Genetics: From Molecules to Morphology Vol. 1 Ch. 29 (eds Singh, R. S. & Krimbas, C. B.) 609–627 (Cambridge Univ. Press, New York, 2000).A readable and amusing overview of the history of population genetics.
Tajima, F. Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123, 585–595 (1989).
Donnelly, P. in Variation in the Human Genome 25–50 (Ciba Foundation–Wiley, Chichester, UK, 1996).This paper lucidly describes the importance of incorporating genealogy in studies of genetic polymorphism.
Saunders, I. W., Tavaré, S. & Watterson, G. A. On the genealogy of nested subsamples from a haploid population. Adv. Appl. Prob. 16, 471–491 (1984).
Nordborg, M. On the probability of Neanderthal ancestry. Am. J. Hum. Genet. 63, 1237–1240 (1998).
Wall, J. D. Detecting ancient admixture in humans using sequence polymorphism data. Genetics 154, 1271–1279 (2000).
Pluzhnikov, A. & Donnelly, P. Optimal sequencing strategies for surveying molecular genetic diversity. Genetics 144, 1247–1262 (1996).The authors describe the effect of recombination in reducing the variation of estimates of evolutionary parameters.
Wu, C.-I. Inferences of species phylogeny in relation to segregation of ancient polymorphisms. Genetics 127, 429–435 (1991).
Kreitman, M. Methods to detect selection in populations with applications to the human. Annu. Rev. Genomics Hum. Genet. 1, 539–559 (2000).
Nielsen, R. Statistical tests of selective neutrality in the age of genomics. Heredity 86, 641–647 (2001).References 30 and 31 describe how the signature of selection in DNA sequence polymorphism might be detected.
Hudson, R. R., Bailey, K., Skarecky, D., Kwiatowski, J. & Ayala, F. J. Evidence for positive selection in the superoxide dismutase (Sod) region of Drosophila melanogaster. Genetics 136, 1329–1340 (1994).
Markovtsova, L., Marjoram, P. & Tavaré, S. On a test of Depaulis and Veuille. Mol. Biol. Evol. 18, 1132–1133 (2001).
Wall, J. D. & Hudson, R. R. Coalescent simulations and statistical tests of neutrality. Mol. Biol. Evol. 18, 1134–1135 (2001).
Depaulis, F., Mousset, S. & Veuille, M. Haplotype tests using coalescent simulations conditional on the number of segregating sites. Mol. Biol. Evol. 18, 1136–1138 (2001).
Takahata, N., Lee, S.-H. & Satta, Y. Testing multiregionality of modern human origins. Mol. Biol. Evol. 18, 172–183 (2001).
Wakeley, J. Distinguishing migration from isolation using the variance of pairwise differences. Theor. Popul. Biol. 49, 369–386 (1996).
Wall, J. D. Recombination and the power of statistical tests of neutrality. Genet. Res. 74, 65–79 (1999).
Pritchard, J. K., Stephens, M., Rosenberg, N. A. & Donnelly, P. Association mapping in structured populations. Am. J. Hum. Genet. 67, 170–181 (2000).
Griffiths, R. C. & Tavaré, S. Ancestral inference in population genetics. Stat. Sci. 9, 307–319 (1994).
Stephens, M. & Donnelly, P. Inference in molecular population genetics. J. R. Stat. Soc. B 62, 605–655 (2000).
Kuhner, M. K., Yamato, J. & Felsenstein, J. Estimating effective population size and mutation rate from sequence data using Metropolis–Hastings sampling. Genetics 140, 1421–1430 (1995).
Kuhner, M. K., Yamato, J. & Felsenstein, J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149, 429–434 (1998).
Wilson, I. J. & Balding, D. J. Genealogical inference from microsatellite data. Genetics 150, 499–510 (1998).
Nielsen, R. Maximum likelihood estimation of population divergence times and population phylogenies under the infinite sites model. Theor. Popul. Biol. 53, 143–151 (1998).
Nielsen, R., Mountain, J. L., Huelsenbeck, J. P. & Slatkin, M. Maximum likelihood estimation of population divergence times and population phylogeny in models without mutation. Evolution 52, 669–677 (1998).
Wilson, I. J., Weale, M. E. & Balding, D. J. Inferences from DNA data: population histories, evolutionary processes, and forensic match probabilities. J. R. Stat. Soc. A (in the press).This is a good example of the likelihood framework. Likelihoods of hierarchical divergence schemes are compared. Using Y-chromosome data, the model supports a division between African and non-African populations for the most ancient human divergence.
Bahlo, M. & Griffiths, R. C. Inference from gene trees in a subdivided population. Theor. Popul. Biol. 57, 79–95 (2000).
Beerli, P. & Felsenstein, J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152, 763–773 (1999).
Nielsen, R. & Slatkin, M. Likelihood analysis of ongoing gene flow and historical association. Evolution 54, 44–50 (2000).
Nielsen, R. & Wakeley, J. Distinguishing migration from isolation: a Markov Chain Monte Carlo approach. Genetics 158, 885–896 (2001).This paper shows considerable progress on a problem that has been notoriously difficult to solve with such methods as genetic-distance analysis, namely, distinguishing between ancient divergence followed by recent migration and recent divergence with no subsequent migration.
Tavaré, S., Balding, D. J., Griffiths, R. C. & Donnelly, P. Inferring coalescence times from DNA sequence data. Genetics 145, 505–518 (1997).A seminal paper that contains one of the first uses of summary statistics for approximate likelihood calculations, an approach which is likely to become increasingly important.
Weiss, G. & von Haeseler, A. Inference of population history using a likelihood approach. Genetics 149, 1539–1546 (1998).
Pritchard, J. K., Seielstad, M. T., Perez-Lezaun, A. & Feldman, M. W. Population growth of human Y chromosomes: a study of Y chromosome microsatellites. Mol. Biol. Evol. 16, 1791–1798 (1999).
Wall, J. D. A comparison of estimators of the population recombination rate. Mol. Biol. Evol. 17, 156–163 (2000).
Rozas, J. & Rozas, R. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics 15, 174–175 (1999).
Hey, J. & Wakeley, J. A coalescent estimator of the population recombination rate. Genetics 145, 833–846 (1997).
Hudson, R. R. Generating samples under a Wright–Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002).
Beerli, P. & Felsenstein, J. Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach. Proc. Natl Acad. Sci. USA 98, 4563–4568 (2001).
Jobling, M. A. In the name of the father: surnames and genetics. Trends Genet. 17, 353–357 (2001).
Gillespie, J. H. Genetic drift in an infinite population: the pseudohitchhiking model. Genetics 155, 909–919 (2000).
Pritchard, J. K. & Donnelly, P. Case–control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).
Ford, M. J. Testing models of migration and isolation among populations of chinook salmon (Oncorhynchus tschawytscha). Evolution 52, 539–557 (1998).
Edwards, S. V. & Beerli, P. Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies. Evolution 54, 1839–1854 (2000).
Crandall, K. A. (ed.) The Evolution of HIV (Johns Hopkins Univ. Press, Baltimore, Maryland, 1999).
Thompson, R. C. A. (ed.) Molecular Epidemiology of Infectious Diseases (Arnold, London, 2000).
Rodrigo, A. G. et al. Coalescent estimates of HIV-1 generation time in vivo. Proc. Natl Acad. Sci. USA 96, 2187–2191 (1999).
Fu, Y.-X. Estimating mutation rate and generation time from longitudinal samples of DNA sequences. Mol. Biol. Evol. 18, 620–626 (2001).
Wu, C.-I. The genic view of the process of speciation. J. Evol. Biol. 14, 851–865 (2001).
Rieseberg, L. H. & Burke, J. M. A genic view of species integration. J. Evol. Biol. 14, 883–886 (2001).
Hey, J. in Molecular Ecology and Evolution: Approaches and Applications (eds Schierwater, B., Streit, B., Wagner, G. P. & DeSalle, R.) 435–449 (Birkhäuser, Basel, Switzerland, 1994).
Maddison, W. P. Gene trees in species trees. Syst. Biol. 46, 523–536 (1997).
Kruglyak, L. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet. 22, 139–144 (1999).
Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001).
Liu, J. S., Sabatti, C., Teng, J., Keats, B. J. B. & Risch, N. Bayesian analysis of haplotypes for linkage disequilibrium mapping. Genome Res. 11, 1716–1724 (2001).
Morris, A. P., Whittaker, J. C. & Balding, D. J. Fine-scale mapping of disease loci via shattered coalescent modeling of genealogies. Am. J. Hum. Genet. 70, 686–707 (2002).References 75 and 76 show how the coalescent might be used for fine mapping of disease-susceptibility sites in a case–control setting.
Patil, N. et al. Blocks of limited haplotype diversity revealed by high resolution scanning of human chromosome 21. Science 294, 1719–1723 (2001).
Rosenberg, N. A. & Feldman, M. W. in Modern Developments in Theoretical Population Genetics ch. 9 (eds Slatkin, M. & Veuille, M.) 130–164 (Oxford Univ. Press, Oxford, UK, 2002).
Nichols, R. Gene trees and species trees are not the same. Trends Ecol. Evol. 16, 358–364 (2001).
Takahata, N. & Nei, M. Allelic genealogy under overdominant and frequency dependent selection and polymorphism of major histocompatibility complex loci. Genetics 124, 967–978 (1990).
Ioerger, T. R., Clark, A. G. & Kao, T.-H. Polymorphism at the self-incompatibility locus in Solanaceae predates speciation. Proc. Natl Acad. Sci. USA 87, 9732–9735 (1990).
Takahashi, K., Terai, Y., Nishida, M. & Okada, N. Phylogenetic relationships and ancient incomplete lineage sorting among cichlid fishes in Lake Tanganyika as revealed by the insertion of retroposons. Mol. Biol. Evol. 18, 2057–2066 (2001).
Pamilo, P. & Nei, M. Relationships between gene trees and species trees. Mol. Biol. Evol. 5, 568–583 (1988).
Takahata, N. Gene genealogy in three related populations: consistency probability between gene and population trees. Genetics 122, 957–966 (1989).
Wakeley, J. The effects of subdivision on the genetic divergence of populations and species. Evolution 54, 1092–1101 (2000).
Eisen, J. A. Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr. Opin. Genet. Dev. 10, 606–611 (2000).
Rosenberg, N. A. The probability of topological concordance of gene trees and species trees. Theor. Popul. Biol. (in the press).
Saitou, N. & Nei, M. The number of nucleotides required to determine the branching order of three species, with special reference to the human–chimpanzee–gorilla divergence. J. Mol. Evol. 24, 189–204 (1986).
Ruvolo, M. Molecular phylogeny of the hominoids: inferences from multiple independent DNA sequence data sets. Mol. Biol. Evol. 14, 248–265 (1997).
Chen, F.-C. & Li, W.-H. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am. J. Hum. Genet. 68, 444–456 (2001).
Palopoli, M. F., Davis, A. W. & Wu, C.-I. Discord between the phylogenies inferred from molecular versus functional data: uneven rates of functional evolution or low levels of gene flow? Genetics 144, 1321–1328 (1996).
Ting, C.-T., Tsaur, S.-C. & Wu, C.-I. The phylogeny of closely related species as revealed by the genealogy of a speciation gene, Odysseus. Proc. Natl Acad. Sci. USA 97, 5313–5316 (2000).
Wang, R.-L., Stec, A., Hey, J., Lukens, L. & Doebley, J. The limits of selection during maize domestication. Nature 398, 236–239 (1999).
Felsenstein, J. Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22, 521–565 (1988).
Cann, R. L., Stoneking, M. & Wilson, A. C. Mitochondrial DNA and human evolution. Nature 325, 31–36 (1987).
Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K. & Wilson, A. C. African populations and the evolution of human mitochondrial DNA. Science 253, 1503–1507 (1991).
Maddison, D. R. African origin of human mitochondrial DNA reexamined. Syst. Zool. 40, 355–363 (1991).
Templeton, A. R. Human origins and analysis of mitochondrial DNA sequences. Science 255, 737 (1992).
Hedges, S. B., Kumar, S., Tamura, K. & Stoneking, M. Human origins and analysis of mitochondrial DNA sequences. Science 255, 737–739 (1992).
Ingman, M., Kaessmann, H., Pääbo, S. & Gyllensten, U. Mitochondrial genome variation and the origin of modern humans. Nature 408, 708–713 (2000).
Mountain, J. L. Molecular evolution and modern human origins. Evol. Anthropol. 7, 21–37 (1998).
Relethford, J. H. Genetics and the Search for Modern Human Origins (Wiley–Liss, New York, 2001).
We thank H. Innan and J. Pritchard for comments, and M. Tanaka, C. Wiuf and an anonymous reviewer for careful reading of the manuscript.
- POLYMORPHISM DATA
Data that include the genotypes of many individuals sampled at one or more loci; here we consider a locus to be polymorphic if two or more distinct types are observed, regardless of their frequencies.
The allelic configuration of multiple genetic markers that are present on a single chromosome of a given individual.
The merging of ancestral lineages going back in time.
- STOCHASTIC PROCESS
A mathematical description of the random evolution of a quantity through time.
- BACTERIAL CONJUGATION
Genetic recombination in prokaryotes that is mediated through direct transfer of DNA from a donor to a recipient cell.
- GENETIC DRIFT
The random fluctuations in allele frequencies over time that are due to chance alone.
- HORIZONTAL TRANSFER
The transfer of genetic material between members of the same generation, or between members of different species.
A function that produces an estimate of some parameter.
- BALANCING SELECTION
The selection that maintains two or more alleles in a population.
- ADAPTIVE RADIATION
The evolution of new species or subspecies to fill unoccupied ecological niches.
- BAYESIAN APPROACH
A statistical perspective that focuses on the probability distribution of parameters, before and after seeing the data.
- FREQUENTIST APPROACH
A statistical perspective that focuses on the frequency with which an observed value is expected in numerous trials.
- TEST STATISTIC
A function that produces values from data for comparing with expected values under various models.
A statistic that quantifies the dispersion of data about the mean.
A temporary marked reduction in population size.
- LIKELIHOOD ANALYSIS
A statistical method that considers the likelihood of observing the data under alternative models.
- IMPORTANCE SAMPLING
A computational technique for efficient numerical calculation of likelihoods.
- MARKOV CHAIN MONTE CARLO
A computational technique for efficient numerical calculation of likelihoods.
- SUMMARY STATISTIC
A function that summarizes complex data in terms of simple numbers (examples include mean and variance).
- SEGREGATING SITE
A DNA base-pair position at which polymorphism is observed in a population.
The mixing of two genetically differentiated populations.
The use of estimated gene genealogies to study geographical history and structure of populations and species.
About this article
Cite this article
Rosenberg, N., Nordborg, M. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms. Nat Rev Genet 3, 380–390 (2002). https://doi.org/10.1038/nrg795