Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genetics in geographically structured populations: defining, estimating and interpreting FST

Key Points

  • Wright's F-statistics, and especially FST, provide important insights into the evolutionary processes that influence the structure of genetic variation within and among populations, and they are among the most widely used descriptive statistics in population and evolutionary genetics.

  • FST is a property of the distribution of allele frequencies among populations. It reflects the joint effects of drift, migration, mutation and selection on the distribution of genetic variation among populations.

  • FST has a central role in population and evolutionary genetics and has wide applications in fields from disease association mapping to forensic science.

  • FST can be used to describe the distribution of genetic variation among any set of samples, but it is most usefully applied when the samples represent discrete units rather than arbitrary divisions along a continuous distribution.

  • Statistics related to FST can be useful for haplotype or microsatellite data if an appropriate measure of evolutionary distance among alleles is available.

  • Comparison of an estimate of FST from marker data with an estimate of QST from continuously varying trait data can be used to detect selection, but the estimate of FST may depend on the choice of marker and the estimate of QST may differ from neutral expectations if there is a non-additive component of genetic variance.

  • Although the simple relationship between FST and migration rates in Wright's island model makes it tempting to infer migration rates from FST, caution is needed if such an approach is to be used.

  • If estimates of FST from many loci are available, it may be possible to identify certain loci as 'outliers' that may have been subject to different patterns of selection or to different demographic processes.

  • Case–control studies for association-mapping studies must account for the possibility that population substructure accounts for an observed association between a marker and a disease. The genomic control method uses background estimates of FST to control for such substructure.

  • In forensic applications, the probabilities of obtaining a match are sometimes calculated for subpopulations that lack specific allele frequency data. A θ correction, in which θ is FST, is used to calculate the probability of a match using allele frequency information from a broader population that the subpopulation is part of.

  • The massive amount of data that is being generated by population genomics projects can be understood fundamentally as allelic variation at individual loci. We therefore expect F-statistics to be at least as useful in understanding these data sets as they have been in population and evolutionary genetics for most of the last century.

Abstract

Wright's F-statistics, and especially FST, provide important insights into the evolutionary processes that influence the structure of genetic variation within and among populations, and they are among the most widely used descriptive statistics in population and evolutionary genetics. Estimates of FST can identify regions of the genome that have been the target of selection, and comparisons of FST from different parts of the genome can provide insights into the demographic history of populations. For these reasons and others, FST has a central role in population and evolutionary genetics and has wide applications in fields that range from disease association mapping to forensic science. This Review clarifies how FST is defined, how it should be estimated, how it is related to similar statistics and how estimates of FST should be interpreted.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Locus-specific estimates of FST on human chromosome 7.

References

  1. Rosenberg, N. A. et al. Genetic structure of human populations. Science 298, 2381–2385 (2002).

    CAS  PubMed  Google Scholar 

  2. Li, J. Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008).

    Article  CAS  PubMed  Google Scholar 

  3. Wright, S. The genetical structure of populations. Ann. Eugen. 15, 323–354 (1951). This paper develops the explicit framework for the analysis and interpretation of F -statistics in an evolutionary context.

    CAS  PubMed  Google Scholar 

  4. Malécot, G. Les Mathématiques de l'Hérédié (Masson, Paris, 1948). This book develops a framework — equivalent to Wright's F -statistics — for the analysis of genetic diversity in hierarchically structured populations.

  5. Wright, S. Evolution in Mendelian populations. Genetics 16, 97–159 (1931). A landmark paper in population genetics in which the effect of population size, mutation and migration on the abundance and distribution of genetic variation in populations is first quantitatively described.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Akey, J. M., Zhang, G., Khang, K., Jin, L. & Shriver, M. D. Interrogating a high-density SNP map for signatures of natural selection. Genome Res. 12, 1805–1814 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Weir, B. S., Cardon, L. R., Anderson, A. D., Nielsen, D. M. & Hill, W. G. Measures of human population structure show heterogeneity among genomic regions. Genome Res. 15, 1468–1476 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Guo, F., Dey, D. K. & Holsinger, K. E. A Bayesian hierarchical model for analysis of SNP diversity in multilocus, multipopulation models. J. Am. Stat. Assoc. 164, 142–154 (2009).

    Google Scholar 

  9. Keinan, A., Mullikin, J. C., Patterson, N. & Reich, D. Accelerated genetic drift on chromosome X during the human dispersal out of Africa. Nature Genet. 41, 66–70 (2009).

    CAS  PubMed  Google Scholar 

  10. Cockerham, C. C. Variance of gene frequencies. Evolution 23, 72–84 (1969). This paper develops the first approach for the analysis of F -statistics that recognizes the effect of genetic sampling on estimates of F -statistics from population data.

    PubMed  Google Scholar 

  11. Wahlund, S. Zusammensetzung von Population und Korrelationserscheinung vom Standpunkt der Vererbungslehre aus betrachtet. Hereditas 11, 65–106 (1928).

    Google Scholar 

  12. Sokal, R. R., Oden, N. L. & Thomson, B. A. A simulation study of microevolutionary inferences by spatial autocorrelation analysis. Biol. J. Linn. Soc. 60, 73–93 (1997).

    Google Scholar 

  13. Sokal, R. R. & Oden, N. L. Spatial autocorrelation analysis as an inferential tool in population genetics. Am. Nat. 138, 518–521 (1991).

    Google Scholar 

  14. Epperson, B. K. Geographical Genetics (Princeton Univ. Press, 2003).

    Google Scholar 

  15. Weir, B. S. & Cockerham, C. C. Mixed self- and random-mating at two loci. Genet. Res. 21, 247–262 (1973).

    CAS  PubMed  Google Scholar 

  16. Wright, S. Evolution and the Genetics of Populations Vol. 4 (Univ. Chicago Press, 1978).

    Google Scholar 

  17. Weir, B. S. Genetic Data Analysis II: Methods for Discrete Population Genetic Data (Sinauer Associates, Sunderland, USA, 1996).

    Google Scholar 

  18. Rousset, F. Inbreeding and relatedness coefficients: what do they measure? Heredity 88, 371–380 (2002).

    CAS  PubMed  Google Scholar 

  19. Casella, G. & Berger, R. L. Statistical Inference (Duxbury, Pacific Grove, 2002).

    Google Scholar 

  20. Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984). This paper develops the ANOVA framework to apply Cockerham's approach to F -statistics and provides method-of-moments estimates for F -statistics.

    CAS  PubMed  Google Scholar 

  21. Excoffier, L. in Handbook of Statistical Genetics (eds Balding, D. J., Bishop, M. & Cannings, V.) 271–307 (John Wiley & Sons, Chichester, 2001).

    Google Scholar 

  22. Cockerham, C. C. Analyses of gene frequencies. Genetics 74, 679–700 (1973).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Berger, J. O. Statistical Decision Theory and Bayesian Analysis (Springer, New York, 1985).

    Google Scholar 

  24. Robert, C. P. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation (Springer, New York, 2001).

    Google Scholar 

  25. Lee, P. M. Bayesian Statistics: An Introduction (Edward Arnold, London, 1989).

    Google Scholar 

  26. Gelfand, A. E. & Smith, A. F. M. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85, 398–409 (1990).

    Google Scholar 

  27. Weir, B. S. & Hill, W. G. Estimating F-statistics. Annu. Rev. Genet. 36, 721–750 (2002).

    CAS  PubMed  Google Scholar 

  28. Wehrhahn, C. Proceedings of the ecological genetics workshop. Genome 31, 1098–1099 (1989).

    Google Scholar 

  29. Samanta, S., Li, Y. J. & Weir, B. S. Drawing inferences about the coancestry coefficient. Theor. Popul. Biol. 75, 312–319 (2009).

    PubMed  PubMed Central  Google Scholar 

  30. Gaggiotti, O. E. et al. Patterns of colonization in a metapopulation of grey seals. Nature 13, 424–427 (2002).

    Google Scholar 

  31. Levsen, N. D., Crawford, D. J., Archibald, J. K., Santos-Geurra, A. & Mort, M. E. Nei's to Bayes': comparing computational methods and genetic markers to estimate patterns of genetic variation in Tolpis (Asteraceae). Am. J. Bot. 95, 1466–1474 (2008).

    PubMed  Google Scholar 

  32. Nei, M. & Chesser, R. K. Estimation of fixation indices and gene diversities. Ann. Hum. Genet. 47, 253–259 (1983).

    CAS  PubMed  Google Scholar 

  33. Nei, M. Analysis of gene diversity in subdivided populations. Proc. Natl Acad. Sci. USA 70, 3321–3323 (1973). This article introduces G ST as a measure of genetic differentiation among populations.

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Excoffier, L., Smouse, P. E. & Quattro, J. M. Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. Genetics 131, 479–491 (1992). This paper introduces Φ ST and AMOVA for the analysis of haplotype data.

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Slatkin, M. A measure of population subdivision based on microsatellite allele frequencies. Genetics 139, 457–462 (1995). This article introduces R ST for the analysis of microsatellite data.

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Rousset, F. Equilibrium values of measures of population subdivision for stepwise mutation processes. Genetics 142, 1357–1362 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Slatkin, M. Inbreeding coefficients and coalescence times. Genet. Res. 58, 167–175 (1991).

    CAS  PubMed  Google Scholar 

  38. Holsinger, K. E. & Mason-Gamer, R. J. Hierarchical analysis of nucleotide diversity in geographically structured populations. Genetics 142, 629–639 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Balloux, F. & Lugon-Molin, N. The estimation of population differentiation with microsatellite markers. Mol. Ecol. 11, 155–165 (2002).

    PubMed  Google Scholar 

  40. Balloux, F., Brunner, F. & Goudet, J. Microsatellites can be misleading: an empirical and simulation study. Evolution 54, 1414–1422 (2000).

    CAS  PubMed  Google Scholar 

  41. Gaggiotti, O. E., Lange, O., Rassman, K. & Gliddon, C. A comparison of two indirect methods for estimating average levels of gene flow using microsatellite data. Mol. Ecol. 8, 1513–1520 (1999).

    CAS  PubMed  Google Scholar 

  42. Spitze, K. Population structure in Daphnia obtusa: quantitative genetic and allozymic variation. Genetics 135, 467–374 (1993). This paper introduces Q ST for the analysis of continuously varying trait data.

    Google Scholar 

  43. Lande, R. Neutral theory of quantitative genetic variance in an island model with local extinction and colonization. Evolution 46, 381–389 (1992).

    PubMed  Google Scholar 

  44. McKay, J. K. & Latta, R. G. Adaptive population divergence: markers, QTL and traits. Trends Ecol. Evol. 17, 285–291 (2002).

    Google Scholar 

  45. O'Hara, R. B. & Merila, J. Bias and precision in QST estimates: problems and some solutions. Genetics 171, 1331–1339 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Goudet, J. & Martin, G. Under neutrality, QST ≤ FST when there is dominance in an island model. Genetics 176, 1371–1374 (2007).

    PubMed  PubMed Central  Google Scholar 

  47. Notohara, M. The coalescent and the genealogical process in geographically structured population. J. Math. Biol. 29, 59–75 (1990).

    CAS  PubMed  Google Scholar 

  48. Charlesworth, B. Fundamental concepts in genetics: effective population size and patterns of molecular evolution and variation. Nature Rev. Genet. 10, 195–205 (2009).

    CAS  PubMed  Google Scholar 

  49. McCauley, D. E. & Whitlock, M. C. Indirect measures of gene flow and migration: FST ≠ 1/(4Nm+1). Heredity 82, 117–125 (1999).

    PubMed  Google Scholar 

  50. Wright, S. Isolation by distance. Genetics 28, 114–138 (1943).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Rousset, F. Genetic differentiation and estimation of gene flow from F-statistics under isolation by distance. Genetics 145, 1219–1228 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Felsenstein, J. How can we infer geography and history from gene frequencies? J. Theor. Biol. 96, 9–20 (1982).

    CAS  PubMed  Google Scholar 

  53. Cann, H. M. et al. A human genome diversity cell line panel. Science 296, 261–262 (2002).

    CAS  PubMed  Google Scholar 

  54. Beerli, P. Comparison of Bayesian and maximum-likelihood estimation of population genetic parameters. Bioinformatics 22, 341–345 (2006).

    CAS  PubMed  Google Scholar 

  55. Kuhner, M. K. Coalescent genealogy samplers: windows into population history. Trends Ecol. Evol. 24, 86–93 (2009).

    PubMed  Google Scholar 

  56. Kuhner, M. K. LAMARC 2.0: maximum likelihood and Bayesian estimation of population parameters. Bioinformatics 22, 768–770 (2006).

    CAS  PubMed  Google Scholar 

  57. Fu, R., Gelfand, A. & Holsinger, K. E. Exact moment calculations for genetic models with migration, mutation, and drift. Theor. Popul. Biol. 63, 231–243 (2003).

    PubMed  Google Scholar 

  58. Beaumont, M. A. & Balding, D. J. Identifying adaptive genetic divergence among populations from genome scans. Mol. Ecol. 13, 969–980 (2004).

    CAS  PubMed  Google Scholar 

  59. Vitalis, R., Dawson, K. & Boursot, P. Interpretation of variation across marker loci as evidence of selection. Genetics 158, 1811–1823 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  60. Beaumont, M. A. & Nichols, R. A. Evaluating loci for use in the genetic analysis of population structure. Proc. R. Soc. Lond. B 263, 1619–1626 (1996).

    Google Scholar 

  61. Foll, M. & Gaggiotti, O. A genome-scan method to identify selected loci appropriate for both dominant and codominant markers: a Bayesian perspective. Genetics 180, 977–993 (2008).

    PubMed  PubMed Central  Google Scholar 

  62. Zhang, Y. et al. Positional cloning of the mouse obese gene and its human homologue. Nature 372, 425–432 (1994).

    CAS  PubMed  Google Scholar 

  63. Mammès, O. et al. Association of the G2548A polymorphism in the 5′ region of the LEP gene with overweight. Ann. Hum. Genet. 64, 391–394 (2000).

    PubMed  Google Scholar 

  64. Balding, D. J. & Donnelly, P. How convincing is DNA evidence? Nature 368, 285–286 (1994).

    CAS  PubMed  Google Scholar 

  65. Balding, D. J. & Nichols, R. A. DNA match probability calculation: how to allow for population stratification, relatedness, database selection, and single bands. Forensic Sci. Int. 64, 125–140 (1994).

    CAS  PubMed  Google Scholar 

  66. Council, N. R. The Evaluation of Forensic DNA Evidence (National Academy Press, Washington DC, 1996).

    Google Scholar 

  67. Devlin, B., Roeder, K. & Wasserman, L. Genomic control, a new approach to genetic-based association studies. Theor. Popul. Biol. 60, 155–166 (2001).

    CAS  PubMed  Google Scholar 

  68. Devlin, B. & Roeder, K. Genomic control for association studies. Biometrics 55, 997–1004 (1999).

    CAS  PubMed  Google Scholar 

  69. Pritchard, J. K. & Donnelly, P. Case–control studies of association in structured or admixed populations. Theor. Popul. Biol. 60, 227–237 (2001).

    CAS  PubMed  Google Scholar 

  70. Pritchard, J. K. & Rosenberg, N. A. Use of unlinked genetic markers to detect population stratification in association studies. Am. J. Hum. Genet. 65, 220–228 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  71. Kingman, J. F. C. On the genealogy of large populations. J. Appl. Prob. 19A, 27–43 (1982).

    Google Scholar 

  72. Kingman, J. F. C. The coalescent. Stoch. Proc. Appl. 13, 235–248 (1982).

    Google Scholar 

  73. Kuhner, M. K. & Smith, L. P. Comparing likelihood and Bayesian coalescent estimation of population parameters. Genetics 175, 155–165 (2007).

    PubMed  PubMed Central  Google Scholar 

  74. Wang, J. A coalescent-based estimator of admixture from DNA sequences. Genetics 173, 1679–1692 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Innan, H., Zhang, K., Marjoram, P., Tavare, S. & Rosenberg, N. A. Statistical tests of the coalescent model based on the haplotype frequency distribution and the number of segregating sites. Genetics 169, 1763–1777 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  76. Wall, J. D. & Hudson, R. R. Coalescent simulations and statistical tests of neutrality. Mol. Biol. Evol. 18, 1134–1135 (2001).

    CAS  PubMed  Google Scholar 

  77. Nordborg, M. Structured coalescent processes on different time scales. Genetics 146, 1501–1514 (1997).

    CAS  PubMed  PubMed Central  Google Scholar 

  78. Donnelly, P. & Tavaré, S. Coalescents and genealogical structure under neutrality. Annu. Rev. Genet. 29, 401–421 (1995).

    CAS  PubMed  Google Scholar 

  79. Griffiths, R. C. & Tavare, S. Simulating probability distributions in the coalescent. Theor. Popul. Biol. 46, 131–159 (1994).

    Google Scholar 

  80. Fearnhead, P. & Donnelly, P. Estimating recombination rates from population genetic data. Genetics 159, 1299–1318 (2001).

    CAS  PubMed  PubMed Central  Google Scholar 

  81. Kuhner, M. K., Beerli, P., Yamato, J. & Felsenstein, J. Usefulness of single nucleotide polymorphism data for estimating population parameters. Genetics 156, 439–447 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  82. Kuhner, M. K., Yamato, J. & Felsenstein, J. Maximum likelihood estimation of recombination rates from population data. Genetics 156, 1393–1401 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  83. Kuhner, M. K. & Felsenstein, J. Sampling among haplotype resolutions in a coalescent-based genealogy sampler. Genet. Epidemiol. 19 (Suppl. 1), 15–21 (2000).

    Google Scholar 

  84. Kuhner, M. K., Yamato, J. & Felsenstein, J. Maximum likelihood estimation of population growth rates based on the coalescent. Genetics 149, 429–434 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  85. Beerli, P. & Felsenstein, J. Maximum-likelihood estimation of migration rates and effective population numbers in two populations using a coalescent approach. Genetics 152, 763–773 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  86. Drummond, A. J., Nicholls, G. K., Rodrigo, A. G. & Solomon, W. Estimating mutation parameters, population history and genealogy simultaneously from temporally spaced sequence data. Genetics 161, 1307–1320 (2002).

    CAS  PubMed  PubMed Central  Google Scholar 

  87. Wright, S. An analysis of local variability of flower color in Linanthus parryae. Genetics 28, 139–156 (1943).

    CAS  PubMed  PubMed Central  Google Scholar 

  88. Malécot, G. The Mathematics of Heredity (W. H. Freeman, San Francisco, 1969).

    Google Scholar 

  89. Hamrick, J. L. & Godt, M. J. W. Effects of life history traits on genetic diversity in plant species. Philos. Trans. R. Soc. Lond. B 351, 1291–1298 (1996).

    Google Scholar 

  90. Hamrick, J. L. in Isozymes in Plant Biology (eds Soltis, D. E. & Soltis, P. S.) 87–105 (Dioscorides, Portland, 1989).

    Google Scholar 

  91. Loveless, M. D. & Hamrick, J. L. Ecological determinants of genetic structure in plant populations. Annu. Rev. Ecol. Syst. 15, 65–95 (1984).

    Google Scholar 

  92. Hamrick, J. L., Linhart, Y. B. & Mitton, J. B. Relationships between life history characteristics and electrophoretically detectable genetic variation in plants. Annu. Rev. Ecol. Syst. 10, 173–200 (1979).

    Google Scholar 

  93. Gottlieb, L. D. in Progress in Phytochemistry Vol. 7 (eds Reinhold, L., Harborne, J. B. & Swain, T.) 1–46 (Pergamon, Oxford, 1981).

    Google Scholar 

  94. Brown, A. H. D. Enzyme polymorphism in plant populations. Theor. Popul. Biol. 15, 1–42 (1979).

    Google Scholar 

  95. International HapMap Consortium et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).

  96. International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  97. He, M. et al. Geographical affinities of the HapMap samples. PLoS ONE 4, e4684 (2009).

    PubMed  PubMed Central  Google Scholar 

  98. Balding, D. J. Likelihood-based inference for genetic correlation coefficients. Theor. Popul. Biol. 63, 221–230 (2003).

    PubMed  Google Scholar 

  99. Foll, M. & Gaggiotti, O. Identifying the environmental factors that determine the genetic structure of populations. Genetics 174, 875–891 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. Begun, D. J. et al. Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol. 5, e310 (2007).

    PubMed  PubMed Central  Google Scholar 

  101. Luikart, G., England, P. R., Tallmon, D., Jordan, S. & Taberlet, P. The power and promise of population genomics: from genotyping to genome typing. Nature Rev. Genet. 4, 981–994 (2003).

    CAS  PubMed  Google Scholar 

  102. Goudet, J., Raymond, M., de Meeus, T. & Rousset, F. Testing differentiation in diploid populations. Genetics 144, 1933–1940 (1996).

    CAS  PubMed  PubMed Central  Google Scholar 

  103. Workman, P. L. & Niswander, J. D. Population studies on southwest Indian tribes. II. Local genetic differentiation in the Papago. Am. J. Hum. Genet. 22, 24–49 (1970).

    CAS  PubMed  PubMed Central  Google Scholar 

  104. Holsinger, K. E. in Hierarchical Modeling for the Environmental Sciences (eds Clark, J. S. & Gelfand, A. E.) 25–37 (Oxford Univ. Press, 2006).

    Google Scholar 

  105. Holsinger, K. E. Analysis of genetic diversity in hierarchically structured populations: a Bayesian perspective. Hereditas 130, 245–255 (1999).

    Google Scholar 

  106. Weir, B. S. The rarity of DNA profiles. Ann. Appl. Stat. 1, 358–370 (2007).

    PubMed  PubMed Central  Google Scholar 

  107. Ritland, K. R. Joint maximum-likelihood estimation of genetic and mating system structure using open-pollinated progenies. Biometrics 42, 25–43 (1986).

    Google Scholar 

  108. Thompson, S. L. & Ritland, K. A novel mating system analysis for modes of self-oriented mating applied to diploid and polyploid arctic Easter daisies (Townsendia hookeri). Heredity 97, 119–126 (2006).

    CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank R. Prunier and K. Theiss for their helpful comments on earlier versions of this Review. The work in the laboratories of the authors was supported in part by grants from the US National Institutes of Health (1 R01 GM 068449-01A1 to K.E.H; 1 R01 GM 075091 to B.S.W).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kent E. Holsinger or Bruce S. Weir.

Related links

Related links

FURTHER INFORMATION

Kent E. Holsinger's homepage

1,000 Genomes project

ABC4F (approximate Bayesian computation for F-statistics)

Arlequin (an integrated software application for population genetics data analysis)

BayeScan (BAYEsian genome SCAN for outliers)

Bayesian population genetic data analysis

GDA (Genetic Data Analysis)

GenAlEx (integrated software for analysis of genetic data with an interface to Excel)

Genepop

GESTE (GEnetic STructure inference based on genetic and Environmental data)

Hickory (software for the analysis of geographic structure in genetic data)

Hierfstat (Weir & Cockerham F-statistics for any number of levels in a hierarchy)

International HapMap Project

Nature Reviews Genetics series on Fundamental Concepts in Genetics

The genetic structure of populations

The genetic structure of populations: a Bayesian approach

The Wahlund effect and Wright's F-statistics

Glossary

Genetic drift

The random fluctuations in allele frequencies over time that are due to chance alone.

Short tandem repeat loci

Loci consisting of short sequences (2–6 nucleotides) that are repeated multiple times. Alleles at short tandem repeat loci differ from one another in their number of repeats.

Variance

A measure of the amount of variation around a mean value.

Diversifying selection

Selection in which different alleles are favoured in different populations. It is often a consequence of local adaptation (in which genotypes from different populations have higher fitness in their home environments owing to historical natural selection).

Hardy–Weinberg proportions

When the frequency of each diploid genotype at a locus equals that expected from the random union of alleles. That is, the genotypes AA, Aa and aa will be at frequencies p2, 2pq and q2, respectively.

Heterozygote advantage

A pattern of natural selection in which heterozygotes are more likely to survive than homozygotes.

Likelihood

A mathematical function that describes the relationship between the unknown parameters of a statistical distribution — for example, the mean and variance of the allele frequency distribution among populations or the allele frequency in a particular population — and the data. It is directly proportional to the probability of the data given the unknown parameters.

Prior distribution

A statistical distribution used in Bayesian analysis to describe the probability that parameters take on a particular value before examining any data. It expresses the level of uncertainty about those parameters before the data have been analysed.

Posterior distribution

A statistical distribution used in Bayesian analysis to describe the probability that parameters take a particular value after the data have been analysed. It reflects both the likelihood of the data given particular parameters and the prior probability that parameters take particular values.

Markov chain Monte Carlo methods

Methods that implement a computational technique that is widely used for approximating complex integrals and other functions. In this context, these methods are used to approximate the posterior distribution of a Bayesian model.

Multinomial distribution

A statistical distribution that describes the probability of obtaining a sample with a specified number of objects in each of several categories. The probability is determined by the total sample size and the probability of drawing an object from each category. The binomial distribution is a special case of the multinomial distribution in which there are two categories.

Additive genetic variance

The part of the total genetic variation that is due to the main (or additive) effects of alleles on a phenotype. The additive variance determines the degree of resemblance between relatives and therefore the response to selection.

Stabilizing selection

Selection in which either the same allele or the same genotype is favoured in different populations.

Effective population size

Formulated by Wright in 1931, the effective population size reflects the size of an idealized population that would experience drift in the same way as the actual (census) population. The effective population size can be lower than the census population size owing to various factors, including a history of population bottlenecks and reduced recombination.

Coalescent-based approaches

Approaches that use statistical properties of the genealogical relationship among alleles under particular demographic and mutational models to make inferences about the effective size of populations and about rates of mutation and migration.

Conditional autoregressive scheme

A statistical approach developed for analysis of data in which a random effect is associated with the spatial location of each observation. The magnitude of the random effect is determined by a weighted average of the random effects of nearby positions. In most applications, the weights of the averages are inversely related to the spatial distance between two sample points.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Holsinger, K., Weir, B. Genetics in geographically structured populations: defining, estimating and interpreting FST. Nat Rev Genet 10, 639–650 (2009). https://doi.org/10.1038/nrg2611

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg2611

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing