Letter | Published:

Natural selection on protein-coding genes in the human genome


Comparisons of DNA polymorphism within species to divergence between species enables the discovery of molecular adaptation in evolutionarily constrained genes as well as the differentiation of weak from strong purifying selection1,2,3,4. The extent to which weak negative and positive darwinian selection have driven the molecular evolution of different species varies greatly5,6,7,8,9,10,11,12,13,14,15,16, with some species, such as Drosophila melanogaster, showing strong evidence of pervasive positive selection6,7,8,9, and others, such as the selfing weed Arabidopsis thaliana, showing an excess of deleterious variation within local populations9,10. Here we contrast patterns of coding sequence polymorphism identified by direct sequencing of 39 humans for over 11,000 genes to divergence between humans and chimpanzees, and find strong evidence that natural selection has shaped the recent molecular evolution of our species. Our analysis discovered 304 (9.0%) out of 3,377 potentially informative loci showing evidence of rapid amino acid evolution. Furthermore, 813 (13.5%) out of 6,033 potentially informative loci show a paucity of amino acid differences between humans and chimpanzees, indicating weak negative selection and/or balancing selection operating on mutations at these loci. We find that the distribution of negatively and positively selected genes varies greatly among biological processes and molecular functions, and that some classes, such as transcription factors, show an excess of rapidly evolving genes, whereas others, such as cytoskeletal proteins, show an excess of genes with extensive amino acid polymorphism within humans and yet little amino acid divergence between humans and chimpanzees.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.


  1. 1

    Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, 1983)

  2. 2

    Hudson, R. R., Kreitman, M. & Aguadé, M. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153–159 (1987)

  3. 3

    McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 (1991)

  4. 4

    Sawyer, S. A. & Hartl, D. L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992)

  5. 5

    Eyre-Walker, A. & Keightley, P. D. High genomic deleterious mutation rates in hominids. Nature 397, 344–347 (1999)

  6. 6

    Fay, J. C., Wyckoff, G. J. & Wu, C. I. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415, 1024–1026 (2002)

  7. 7

    Sawyer, S. A., Kulathinal, R. J., Bustamante, C. D. & Hartl, D. L. Bayesian analysis suggests that most amino acid replacements in Drosophila are driven by positive selection. J. Mol. Evol. 57 (suppl. 1), S154–S164 (2003)

  8. 8

    Smith, N. G. & Eyre-Walker, A. Adaptive protein evolution in Drosophila. Nature 415, 1022–1024 (2002)

  9. 9

    Bustamante, C. D. et al. The cost of inbreeding in Arabidopsis. Nature 416, 531–534 (2002)

  10. 10

    Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, e196 (2005)

  11. 11

    Halushka, M. K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–247 (1999)

  12. 12

    Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999)

  13. 13

    Stephens, J. C. et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293, 489–493 (2001)

  14. 14

    Livingston, R. J. et al. Pattern of sequence variation across 213 environmental response genes. Genome Res. 14, 1821–1831 (2004)

  15. 15

    Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001)

  16. 16

    Williamson, S. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl Acad. Sci. USA 102, 7882–7887 (2005)

  17. 17

    Barrier, M., Bustamante, C. D., Yu, J. & Purugganan, M. D. Selection on rapidly evolving proteins in the Arabidopsis genome. Genetics 163, 723–733 (2003)

  18. 18

    Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 6, e170 (2005)

  19. 19

    Manunta, P. et al. Alpha-adducin polymorphisms and renal sodium handling in essential hypertensive patients. Kidney Int. 53, 1471–1478 (1998)

  20. 20

    Morrison, A. C., Bray, M. S., Folsom, A. R. & Boerwinkle, E. ADD1 460W allele associated with cardiovascular disease in hypertensive individuals. Hypertension 39, 1053–1057 (2002)

  21. 21

    Kent, W. J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

  22. 22

    Weinreich, D. M. & Rand, D. M. Contrasting patterns of nonneutral evolution in proteins encoded in nuclear and mitochondrial genomes. Genetics 156, 385–399 (2000)

  23. 23

    Williamson, S., Fledel-Alon, A. & Bustamante, C. D. Population genetics of polymorphism and divergence for diploid selection models with arbitrary dominance. Genetics 168, 463–475 (2004)

  24. 24

    Ioerger, T. R., Clark, A. G. & Kao, T. H. Polymorphism at the self-incompatibility locus in Solanaceae predates speciation. Proc. Natl Acad. Sci. USA 87, 9732–9735 (1990)

Download references


We thank K. Thornton and B. Payseur for suggestions during the analysis. Some of the analysis was supported by NIH grants to C.D.B., R.N. and A.G.C. We also acknowledge the help of J. Pillardy and the Cornell University Theory Center Computational Biology Service Unit. Author Contributions S.G., D.M.T., D.C., T.J.W., J.J.S., M.D.A. and M.C. conceived, designed and performed the experiments. C.D.B., A.F.-A., A.G.C., S.W., R.N. and M.J.H. analysed the data.

Author information

Correspondence to Carlos D. Bustamante.

Ethics declarations

Competing interests

Accession numbers for the SNP markers analysed in this study are dbSNP numbers ss48401226–ss48429818 and ss48429821–ss48431291, submitted under the handle APPLERA_GI. Reprints and permissions information is available at npg.nature.com/reprintsandpermissions. The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

A spreadsheet file containing the Mann-Whitney and Z-test results for all Panther classification of molecular function and biological process. (XLS 139 kb)

Supplementary Data 2

An text file with one line per gene giving the cell entries in the McDonald-Kreitman tables, estimated selection intensities and confidence intervals, and well as posterior P-values. (TXT 1086 kb)

Supplementary Methods

A detailed description of how the Single Nucleotide Polymorphisms we analyze in this paper were discovered and validated. Also includes details on Bioinformatic controls and quality checks. (DOC 131 kb)

Supplementary Data 1

Provides a detailed mathematical description of the statistical method we employ in this paper as well as details of coalescent simulations used to gauge robustness to demographic misspecification. (PDF 798 kb)

Supplementary Figure 1

Relationship between scaled McDonald–Kreitman cell entries and posterior mean of the selection coefficient γ for all genes in the INS data set. (PDF 1614 kb)

Supplementary Figure 2

Scatterplot of log-odds posterior of negative selection . (PDF 45 kb)

Supplementary Figure Legends

Text to accompany the above Supplementary Figures. (DOC 40 kb)

Rights and permissions

Reprints and Permissions

About this article

Further reading

Figure 1: Summary distributions of McDonald–Kreitman cell entries and mkprf analyses.
Figure 2: A selection map of the human genome.
Figure 3: Association between negative selection and disease.


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.