Estimating the mutation load in human genomes

Journal name:
Nature Reviews Genetics
Year published:
Published online


Next-generation sequencing technology has facilitated the discovery of millions of genetic variants in human genomes. A sizeable fraction of these variants are predicted to be deleterious. Here, we review the pattern of deleterious alleles as ascertained in genome sequencing data sets and ask whether human populations differ in their predicted burden of deleterious alleles — a phenomenon known as mutation load. We discuss three demographic models that are predicted to affect mutation load and relate these models to the evidence (or the lack thereof) for variation in the efficacy of purifying selection in diverse human genomes. We also emphasize why accurate estimation of mutation load depends on assumptions regarding the distribution of dominance and selection coefficients — quantities that remain poorly characterized for current genomic data sets.

At a glance


  1. Proportion of deleterious variants found in an individual's genome classified by their frequency in the population (common versus rare).
    Figure 1: Proportion of deleterious variants found in an individual's genome classified by their frequency in the population (common versus rare).

    We wanted to ascertain whether the deleterious portion of an individual's genome is mostly represented by rare or common variants. For the Yoruba (YRI) population in the 1000 Genomes Project, variants were assigned to three selection regimes (moderate, large and extreme), according to genomic evolutionary rate profiling (GERP) score categories in increasing order of phylogenetic conservation: 2:4, 4:6 and >6. The more conserved a site is, the more likely it is that a new allele is deleterious (Box 2). Deleterious single-nucleotide polymorphisms (SNPs) with a derived allele frequency lower than 5% within the population (shown in purple) are classified as 'rare' and the rest as 'common' (shown in blue). Almost 70% of the deleterious SNPs found in an individual genome are common, and most of them have a small predicted effect ('moderate'). Half of the rare SNPs also have a moderate effect, and half of them have a large effect, demonstrating how low-frequency, large-effect variants have not yet been purged by purifying selection.

  2. Differences in the site frequency spectrum across populations for deleterious and neutral variants.
    Figure 2: Differences in the site frequency spectrum across populations for deleterious and neutral variants.

    The site frequency spectrum (SFS) can be a powerful method for summarizing genomic data. The figure shows the SFSs for four populations, focusing on both low-frequency variants (minor allele frequency (MAF) <0.18; left panels) and nearly fixed variants (MAF >0.82; right panels). Derived variants were annotated with genomic evolutionary rate profiling (GERP) scores (see Supplementary information S1 (box)). In part a, we plot single-nucleotide polymorphisms (SNPs) that are predicted to have a 'large' deleterious effect (GERP >4). In part b, we plot SNPs that are predicted to have a 'neutral' effect (GERP <2). Using 1000 Genomes Project Phase 1 exome data34, we sampled 42 individuals from the Yoruba (YRI, Nigeria), Mexican (MXL, Mexico), Tuscan (TSI, Italy) and Japanese (JPT, Japan) populations. Only individuals sequenced on the same Agilent exome platform were compared here to avoid biases in target capture between platforms. Demography results in different SFS for each population. Neutral variants provide a null demographic model. The African YRI population have the highest number of rare deleterious variants, although the JPT and TSI populations have many more deleterious fixed variants, possibly owing to ancient founder effects resulting in the fixation by strong drift (also noted in Ref. 48). By comparing the difference between the neutral and deleterious SFS (see Supplementary information S2 (figure)), one can infer the impact of purifying selection. For example, non-African populations have a larger proportion of deleterious variants that are fixed than that seen neutrally.

  3. Schematic of different demographic models for the Out-of-Africa dispersal.
    Figure 3: Schematic of different demographic models for the Out-of-Africa dispersal.

    Three demographic models have been discussed in the context of changes in genetic load due to extreme genetic drift across different human populations. All three models allow for a severe Out-of-Africa bottleneck and recovery but with varying degrees of subsequent changes in population size. Coloured dots indicate allelic diversity; the width of the column is proportional to the effective population size (Ne). The bottom tube represents the ancestral African population size, with later events occurring in temporal sequence towards the top of the figure.

  4. Mutation load under an additive and a recessive model.
    Figure 4: Mutation load under an additive and a recessive model.

    Using the same data set as in Fig. 2, we computed the total mutation load2 for each population. Genomic evolutionary rate profiling (GERP) scores were annotated for whole-exome data. Variants were grouped into three categories according their GERP score (2:4, 4:6 and >6), corresponding to different biological functional effects. The more phylogenetically conserved a site is, the more likely it is that a new allele is deleterious and has a high GERP score (see Supplementary information S1 (box)). Within each category, three selection coefficients were assigned, using the inferred s coefficients in Boyko et al.47: s = −4.5 × 10−4, s = −4.5 × 10−3 and s = −1 × 10−2. The total mutation load is the sum of load for each locus2. The mutation load under an additive model is higher than the mutation load under a recessive model because the phenotypic effect of a variant is masked in the recessive homozygous state. Although only slight differences exist between populations for an additive model of dominance (~1.5%), strong differences occur under a recessive model because of the differential number of derived homozygotes among populations. JPT, Japanese (Japan); MXL, Mexican (Mexico); TSI, Tuscan (Italy); YRI, Yoruba (Nigeria).


  1. Ohta, T. & Gillespie, J. Development of neutral and nearly neutral theories. Theor. Popul. Biol. 49, 128142 (1996).
    This paper reviews the development of the neutral and nearly neutral theories by key contributors to the field of population genetics.
  2. Kimura, M., Maruyama, T. & Crow, J. F. The mutation load in small populations. Genetics 48, 13031312 (1963).
    This is a foundational paper on the effect of drift on mutation load in finite populations, demonstrating that mildly deleterious alleles can contribute more to load than strongly deleterious alleles.
  3. King, J. L. & Jukes, T. H. Non-Darwinian evolution. Science 164, 788798 (1969).
  4. Marth, G. T., Czabarka, E., Murvai, J. & Sherry, S. T. The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 166, 351372 (2004).
  5. Laval, G., Patin, E., Barreiro, L. B. & Quintana-Murci, L. Formulating a historical and demographic model of recent human evolution based on resequencing data from noncoding regions. PLoS ONE 5, e10284 (2010).
  6. Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 10311034 (2011).
  7. Veeramah, K. R. et al. An early divergence of KhoeSan ancestors from those of other modern humans is supported by an ABC-based analysis of autosomal resequencing data. Mol. Bio Evol. 29, 617630 (2012).
  8. Kimura, M. Evolutionary rate at the molecular level. Nature 217, 624626 (1968).
  9. Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, 1985).
  10. Ohta, T. Slightly deleterious mutant substitutions in evolution. Nature 246, 9698 (1973).
  11. Crow, J. F. Genetic loads and the cost of natural selection. Math. Top. Popul. Genet. 1, 128177 (1970).
  12. Agrawal, A. F. & Whitlock, M. C. Mutation load: the fitness of individuals in populations where deleterious alleles are abundant. Annu. Rev. Ecol. Evol. Syst. 43, 115135 (2012).
  13. Crow, J. F. 2. The concept of genetic load: a reply. Am. J. Hum. Genet. 15, 310315 (1963).
  14. Charlesworth, D. & Willis, J. H. Fundamental concepts in genetics: the genetics of inbreeding depression. Nature Rev. Genet. 10, 783796 (2009).
    This is a broad review of inbreeding depression and heterosis, fitness phenomena that are caused by the presence of deleterious recessive mutations in populations.
  15. Li, J. Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 11001104 (2008).
  16. Henn, B. M., Cavalli-Sforza, L. L. & Feldman, M. W. The great human expansion. Proc. Natl Acad. Sci. USA 109, 1775817764 (2012).
  17. Agrawal, A. F. & Whitlock, M. C. Inferences about the distribution of dominance drawn from yeast gene knockout data. Genetics 187, 553566 (2011).
    The distribution of dominance coefficients is directly measured from yeast knockout experiments, showing that large-effect mutations tend to be more recessive than weak-effect mutations.
  18. Mukai, T., Chigusa, S. I., Mettler, L. E. & Crow, J. F. Mutation rate and dominance of genes affecting viability in Drosophila melanogaster. Genetics 72, 335355 (1972).
  19. Houle, D., Hughes, K. A., Assimacopoulos, S. & Charlesworth, B. The effects of spontaneous mutation on quantitative traits. II. Dominance of mutations with effects on life-history traits. Genet. Res. 70, 2734 (1997).
  20. Manna, F., Martin, G. & Lenormand, T. Fitness landscapes: an alternative theory for the dominance of mutation. Genetics 189, 923937 (2011).
  21. Morton, N. E., Crow, J. F. & Muller, H. J. An estimate of the mutational damage in man from data on consanguienous marriages. Proc. Natl Acad. Sci. USA 42, 855863 (1956).
    This is among the earliest work to empirically measure the mutation load in humans by considering the reduction in fitness due to recessive mutations in consanguineous unions.
  22. Reich, D. E. & Lander, E. S. On the allelic spectrum of human disease. Trends Genet. 17, 502510 (2001).
  23. Bittles, A. H. & Black, M. L. Consanguinity, human evolution, and complex diseases. Proc. Natl Acad. Sci. USA 107, 17791786 (2010).
  24. Szpiech, Z. A. et al. Long runs of homozygosity are enriched for deleterious variation. Am. J. Hum. Genet. 93, 90102 (2013).
  25. McQuillan, R. et al. Evidence of inbreeding depression on human height. PLoS Genet. 8, e1002655 (2012).
  26. Tabor, H. K. et al. Pathogenic variants for Mendelian and complex traits in exomes of 6,517 European and African Americans: implications for the return of incidental results. Am. J. Hum. Genet. 95, 183193 (2014).
    Based on analysis of the exome sequences of >6,500 individuals, this study shows that nearly 45% of individuals carry a known variant associated with severe Mendelian diseases.
  27. Xue, Y. et al. Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am. J. Hum. Genet. 91, 10221032 (2012).
  28. Li, Y. et al. Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants. Nature Genet. 42, 969972 (2010).
  29. Erickson, R. P. & Mitchison, N. A. The low frequency of recessive disease: insights from ENU mutagenesis, severity of disease phenotype, GWAS associations, and demography: an analytical review. J. Appl. Genet. 55, 319327 (2014).
  30. De la Cruz, O. & Raska, P. Population structure at different minor allele frequency levels. BMC Proc. 8, S55 (2014).
  31. Henn, B. M., Gravel, S., Moreno-Estrada, A., Acevedo-Acevedo, S. & Bustamante, C. D. Fine-scale population structure and the era of next-generation sequencing. Hum. Mol. Genet. 19, R221R226 (2010).
  32. Mathieson, I. & McVean, G. Demography and the age of rare variants. PLoS Genet. 10, e1004528 (2014).
  33. Deshpande, O., Batzoglou, S., Feldman, M. W. & Luca Cavalli-Sforza, L. A serial founder effect model for human settlement out of Africa. Proc. Biol. Sci. 276, 291300 (2009).
  34. 1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 490, 5665 (2013).
  35. DeGiorgio, M., Jakobsson, M. & Rosenberg, N. A. Explaining worldwide patterns of human genetic variation using a coalescent-based serial founder model of migration outward from Africa. Proc. Natl Acad. Sci. USA 106, 1605716062 (2009).
  36. Nelson, M. R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100104 (2012).
  37. Tennessen, J. A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 6469 (2012).
  38. Goode, D. L. et al. Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes. Genome Res. 20, 301310 (2010).
  39. MacArthur, D. G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823828 (2012).
  40. Pritchard, J. K. Are rare variants responsible for susceptibility to complex diseases? Am. J. Hum. Genet. 69, 124137 (2001).
  41. Agarwala, V., Flannick, J., Sunyaev, S., GoT2D Consortium & Altshuler, D. Evaluating empirical bounds on complex disease genetic architecture. Nature Genet. 45, 14181427 (2013).
  42. Gibson, G. Rare and common variants: twenty arguments. Nature Rev. Genet. 13, 135145 (2012).
  43. Maher, M. C., Uricchio, L. H., Torgerson, D. G. & Hernandez, R. D. Population genetics of rare variants and complex diseases. Hum. Hered. 74, 118128 (2012).
  44. Klopfstein, S. The fate of mutations surfing on the wave of a range expansion. Mol. Bio. Evol. 23, 482490 (2005).
  45. Marth, G. T. et al. The functional spectrum of low-frequency coding variation. Genome Biol. 12, R84 (2011).
  46. Keinan, A. & Clark, A. G. Recent explosive human population growth has resulted in an excess of rare genetic variants. Science 336, 740743 (2012).
  47. Boyko, A. R. et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 4, e1000083 (2008).
    This paper estimates selection coefficients for alleles with different predicted deleterious effects in humans and includes a discussion of methods to infer the DFE via site frequency spectra.
  48. Lohmueller, K. E. et al. Proportionally more deleterious genetic variation in European than in African populations. Nature 451, 994997 (2008).
    This is a formative paper considering the proportion of deleterious mutations in European-Americans compared to African-Americans based on analysis of an early genome sequencing data set. The higher proportion of deleterious variants in European-Americans was ascribed to increased genetic drift during the Out-of-Africa bottleneck.
  49. Simons, Y. B., Turchin, M. C., Pritchard, J. K. & Sella, G. The deleterious mutation load is insensitive to recent population history. Nature Genet. 46, 220224 (2014).
    This paper challenges the earlier studies (for example, reference 48) by demonstrating, via simulation, that the average number of deleterious mutations per individual under an additive model should be the same across populations for different human demographic histories.
  50. Casals, F. et al. Whole-exome sequencing reveals a rapid change in the frequency of rare functional variants in a founding population of humans. PLoS Genet. 9, e1003815 (2013).
  51. Do, R. et al. No evidence that selection has been less effective at removing deleterious mutations in Europeans than in Africans. Nature Genet. 47, 126131 (2015).
  52. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216220 (2013).
  53. Fu, W., Gittelman, R. M., Bamshad, M. J. & Akey, J. M. Characteristics of neutral and deleterious protein-coding variation among individuals and populations. Am. J. Hum. Genet. 95, 421436 (2014).
    This paper shows that European-American individuals carry slightly more deleterious derived alleles in their genome sequences, on average, than African-Americans under a conservation-based framework to predict variant function; this is consistent with Out-of-Africa bottleneck simulations.
  54. Gravel, S. When is selection effective? bioRXiv (2014).
  55. Lim, E. T. et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 10, e1004494 (2014).
  56. Sajantila, A. et al. Paternal and maternal DNA lineages reveal a bottleneck in the founding of the Finnish population. Proc. Natl Acad. Sci. USA 93, 1203512039 (1996).
  57. Gravel, S. et al. Demographic history and rare allele sharing among human populations. Proc. Natl Acad. Sci. USA 108, 1198311988 (2011).
  58. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493496 (2011).
  59. Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222226 (2012).
  60. Gutenkunst, R. N., Hernandez, R. D., Williamson, S. H. & Bustamante, C. D. Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 5, e1000695 (2009).
  61. Henn, B. M. et al. Hunter-gatherer genomic diversity suggests a southern African origin for modern humans. Proc. Natl Acad. Sci. 108, 51545162 (2011).
  62. Lohmueller, K. E. The impact of population demography and selection on the genetic architecture of complex traits. PLoS Genet. 10, e1004379 (2014).
    Reprising his earlier work in reference 48, this paper focuses on patterns of deleterious variants over time, given different demographic scenarios of expansion, bottleneck and combinations of demographic events.
  63. Ramachandran, S. et al. Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc. Natl Acad. Sci. USA 102, 1594215947 (2005).
  64. Sousa, V., Peischl, S. & Excoffier, L. Impact of range expansions on current human genomic diversity. Curr. Opin. Genet. Dev. 29, 2230 (2014).
  65. Moreau, C. et al. Deep human genealogies reveal a selective advantage to be on an expanding wave front. Science 334, 11481150 (2011).
  66. Peischl, S., Dupanloup, I., Kirkpatrick, M. & Excoffier, L. On the accumulation of deleterious mutations during range expansions. Mol. Ecol. 22, 59725982 (2013).
    This is a complex simulation study showing that range expansion can result in expansion load from deleterious mutations that rise to a high frequency on a geographical wave front.
  67. Flaxman, S. M. Surfing downhill: when should population range expansion be characterized by reductions in fitness? Mol. Ecol. 22, 59635965 (2013).
  68. Coventry, A. et al. Deep resequencing reveals excess rare recent variants consistent with explosive population growth. Nature Commun. 1, 131136 (2010).
  69. Gignoux, C. R., Henn, B. M. & Mountain, J. L. Rapid, global demographic expansions after the origins of agriculture. Proc. Natl Acad. Sci. USA 108, 60446049 (2011).
  70. Zheng, H.-X., Yan, S., Qin, Z.-D. & Jin, L. MtDNA analysis of global populations support that major population expansions began before Neolithic time. Sci. Rep. 2, 745 (2012).
  71. Forster, P. Ice ages and the mitochondrial DNA chronology of human dispersals: a review. Phil. Trans. R. Soc. Lond. B. 359, 255264 (2004).
  72. Gazave, E., Chang, D., Clark, A. G. & Keinan, A. Population growth inflates the per-individual number of deleterious mutations and reduces their mean effect. Genetics 195, 969978 (2013).
  73. Kamberov, Y. G. et al. Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell 152, 691702 (2013).
  74. Hernandez, R. D. et al. Classic selective sweeps were rare in recent human evolution. Science 331, 920924 (2011).
  75. Moschovis, P. P. et al. Childhood anemia at high altitude: risk factors for poor outcomes in severe pneumonia. Pediatrics 132, e1156e1162 (2013).
  76. Whitlock, M. C. & Bourguet, D. Factors affecting the genetic load in Drosophila: synergistic epistasis and correlations among fitness components. Evolution 54, 16541660 (2000).
  77. Fry, J. D. On the rate and linearity of viability declines in Drosophila mutation-accumulation experiments: genomic mutation rates and synergistic epistasis revisited. Genetics 166, 797806 (2004).
  78. Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability: genetic interactions create phantom heritability. Proc. Natl Acad. Sci. USA 109, 11931198 (2012).
  79. Arbiza, L. et al. Genome-wide inference of natural selection on human transcription factor binding sites. Nature Genet. 45, 723729 (2013).
  80. Lohmueller, K. E. The distribution of deleterious genetic variation in human populations. Curr. Opin. Genet. Dev. 29, 139146 (2014).

Download references

Author information


  1. Department of Ecology and Evolution, Stony Brook University, 650 Life Sciences Building, Stony Brook, New York 11794–5245, USA.

    • Brenna M. Henn &
    • Laura R. Botigué
  2. Stanford University School of Medicine, Department of Genetics, 291 Campus Drive, Stanford, California 94305, USA.

    • Carlos D. Bustamante
  3. Cornell University, Department of Molecular Biology and Genetics, 526 Campus Road, Ithaca, New York 14853–2703, USA.

    • Andrew G. Clark
  4. McGill University, Department of Human Genetics and Genome Quebec Innovation Centre, 740 Dr Penfield Avenue, Montreal, Quebec H3A 0G1, Canada.

    • Simon Gravel

Competing interests statement

The authors declare no competing interests.

Corresponding author

Correspondence to:

Author details

  • Brenna M. Henn

    Brenna M. Henn is a population geneticist specializing in the evolution of human genetic diversity. She is currently an assistant professor in the Department of Ecology and Evolution, Stony Brook University, New York, USA. She completed her graduate work in anthropology and postdoctoral work in human genomics at Stanford University, California, USA. Brenna M. Henn's homepage.

  • Laura R. Botigué

    Laura R. Botigué is a population geneticist focusing on the effect of demography on the genomic architecture of different human populations. She is currently a postdoctoral associate in the Henn Laboratory in the Department of Ecology and Evolution at Stony Brook University, New York, USA. She completed her graduate work in biomedicine at the Universitat Pompeu Fabra, Barcelona, Spain.

  • Carlos D. Bustamante

    Carlos D. Bustamante is a population geneticist who analyses genome-wide patterns of variation to address fundamental questions in biology, anthropology and medicine. He is Professor of Genetics at Stanford University School of Medicine, California, USA, and Co-Founding Director of the Stanford Center for Computational, Evolutionary, and Human Genomics (CEHG), California, USA.

  • Andrew G. Clark

    Andrew G. Clark is a population geneticist who studies several aspects of complex-trait genetics, including the impact of recent human demography on patterns of variation. He is the Jacob Gould Schurman Professor of Molecular Biology and Genetics at Cornell University, Ithaca, New York, USA.

  • Simon Gravel

    Simon Gravel is a population geneticist who specializes in statistical and mathematical models for interpreting population genomic data. He is a Sloan Fellow, holds the Canada Research Chair in Statistical and Population Genetics, and is an assistant professor at the Department of Human Genetics at McGill University, Montreal, Quebec, Canada.

Supplementary information

PDF files

  1. Supplementary information S1 (box) (171 KB)

    Variant Annotation Algorithms

  2. Supplementary information S2 (figure) (249 KB)

    Demographic history based on the site frequency spectrum and sharing of rare alleles.

  3. Supplementary information S3 (figure) (238 KB)

    Allele sharing versus allele frequency among European populations.

Additional data