Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Natural selection on protein-coding genes in the human genome

Abstract

Comparisons of DNA polymorphism within species to divergence between species enables the discovery of molecular adaptation in evolutionarily constrained genes as well as the differentiation of weak from strong purifying selection1,2,3,4. The extent to which weak negative and positive darwinian selection have driven the molecular evolution of different species varies greatly5,6,7,8,9,10,11,12,13,14,15,16, with some species, such as Drosophila melanogaster, showing strong evidence of pervasive positive selection6,7,8,9, and others, such as the selfing weed Arabidopsis thaliana, showing an excess of deleterious variation within local populations9,10. Here we contrast patterns of coding sequence polymorphism identified by direct sequencing of 39 humans for over 11,000 genes to divergence between humans and chimpanzees, and find strong evidence that natural selection has shaped the recent molecular evolution of our species. Our analysis discovered 304 (9.0%) out of 3,377 potentially informative loci showing evidence of rapid amino acid evolution. Furthermore, 813 (13.5%) out of 6,033 potentially informative loci show a paucity of amino acid differences between humans and chimpanzees, indicating weak negative selection and/or balancing selection operating on mutations at these loci. We find that the distribution of negatively and positively selected genes varies greatly among biological processes and molecular functions, and that some classes, such as transcription factors, show an excess of rapidly evolving genes, whereas others, such as cytoskeletal proteins, show an excess of genes with extensive amino acid polymorphism within humans and yet little amino acid divergence between humans and chimpanzees.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Figure 1: Summary distributions of McDonald–Kreitman cell entries and mkprf analyses.
Figure 2: A selection map of the human genome.
Figure 3: Association between negative selection and disease.

References

  1. Kimura, M. The Neutral Theory of Molecular Evolution (Cambridge Univ. Press, Cambridge, 1983)

    Book  Google Scholar 

  2. Hudson, R. R., Kreitman, M. & Aguadé, M. A test of neutral molecular evolution based on nucleotide data. Genetics 116, 153–159 (1987)

    CAS  PubMed  PubMed Central  Google Scholar 

  3. McDonald, J. H. & Kreitman, M. Adaptive protein evolution at the Adh locus in Drosophila. Nature 351, 652–654 (1991)

    ADS  CAS  Article  PubMed  Google Scholar 

  4. Sawyer, S. A. & Hartl, D. L. Population genetics of polymorphism and divergence. Genetics 132, 1161–1176 (1992)

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Eyre-Walker, A. & Keightley, P. D. High genomic deleterious mutation rates in hominids. Nature 397, 344–347 (1999)

    ADS  CAS  Article  PubMed  Google Scholar 

  6. Fay, J. C., Wyckoff, G. J. & Wu, C. I. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415, 1024–1026 (2002)

    ADS  CAS  Article  PubMed  Google Scholar 

  7. Sawyer, S. A., Kulathinal, R. J., Bustamante, C. D. & Hartl, D. L. Bayesian analysis suggests that most amino acid replacements in Drosophila are driven by positive selection. J. Mol. Evol. 57 (suppl. 1), S154–S164 (2003)

    ADS  CAS  Article  PubMed  Google Scholar 

  8. Smith, N. G. & Eyre-Walker, A. Adaptive protein evolution in Drosophila. Nature 415, 1022–1024 (2002)

    ADS  CAS  Article  PubMed  Google Scholar 

  9. Bustamante, C. D. et al. The cost of inbreeding in Arabidopsis. Nature 416, 531–534 (2002)

    ADS  CAS  Article  PubMed  Google Scholar 

  10. Nordborg, M. et al. The pattern of polymorphism in Arabidopsis thaliana. PLoS Biol. 3, e196 (2005)

    Article  PubMed  PubMed Central  Google Scholar 

  11. Halushka, M. K. et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet. 22, 239–247 (1999)

    CAS  Article  PubMed  Google Scholar 

  12. Cargill, M. et al. Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet. 22, 231–238 (1999)

    CAS  Article  PubMed  Google Scholar 

  13. Stephens, J. C. et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science 293, 489–493 (2001)

    CAS  Article  PubMed  Google Scholar 

  14. Livingston, R. J. et al. Pattern of sequence variation across 213 environmental response genes. Genome Res. 14, 1821–1831 (2004)

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Sunyaev, S. et al. Prediction of deleterious human alleles. Hum. Mol. Genet. 10, 591–597 (2001)

    CAS  Article  PubMed  Google Scholar 

  16. Williamson, S. et al. Simultaneous inference of selection and population growth from patterns of variation in the human genome. Proc. Natl Acad. Sci. USA 102, 7882–7887 (2005)

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  17. Barrier, M., Bustamante, C. D., Yu, J. & Purugganan, M. D. Selection on rapidly evolving proteins in the Arabidopsis genome. Genetics 163, 723–733 (2003)

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 6, e170 (2005)

    Article  Google Scholar 

  19. Manunta, P. et al. Alpha-adducin polymorphisms and renal sodium handling in essential hypertensive patients. Kidney Int. 53, 1471–1478 (1998)

    CAS  Article  PubMed  Google Scholar 

  20. Morrison, A. C., Bray, M. S., Folsom, A. R. & Boerwinkle, E. ADD1 460W allele associated with cardiovascular disease in hypertensive individuals. Hypertension 39, 1053–1057 (2002)

    CAS  Article  PubMed  Google Scholar 

  21. Kent, W. J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. Weinreich, D. M. & Rand, D. M. Contrasting patterns of nonneutral evolution in proteins encoded in nuclear and mitochondrial genomes. Genetics 156, 385–399 (2000)

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Williamson, S., Fledel-Alon, A. & Bustamante, C. D. Population genetics of polymorphism and divergence for diploid selection models with arbitrary dominance. Genetics 168, 463–475 (2004)

    Article  PubMed  PubMed Central  Google Scholar 

  24. Ioerger, T. R., Clark, A. G. & Kao, T. H. Polymorphism at the self-incompatibility locus in Solanaceae predates speciation. Proc. Natl Acad. Sci. USA 87, 9732–9735 (1990)

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank K. Thornton and B. Payseur for suggestions during the analysis. Some of the analysis was supported by NIH grants to C.D.B., R.N. and A.G.C. We also acknowledge the help of J. Pillardy and the Cornell University Theory Center Computational Biology Service Unit. Author Contributions S.G., D.M.T., D.C., T.J.W., J.J.S., M.D.A. and M.C. conceived, designed and performed the experiments. C.D.B., A.F.-A., A.G.C., S.W., R.N. and M.J.H. analysed the data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos D. Bustamante.

Ethics declarations

Competing interests

Accession numbers for the SNP markers analysed in this study are dbSNP numbers ss48401226–ss48429818 and ss48429821–ss48431291, submitted under the handle APPLERA_GI. Reprints and permissions information is available at npg.nature.com/reprintsandpermissions. The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

A spreadsheet file containing the Mann-Whitney and Z-test results for all Panther classification of molecular function and biological process. (XLS 139 kb)

Supplementary Data 2

An text file with one line per gene giving the cell entries in the McDonald-Kreitman tables, estimated selection intensities and confidence intervals, and well as posterior P-values. (TXT 1086 kb)

Supplementary Methods

A detailed description of how the Single Nucleotide Polymorphisms we analyze in this paper were discovered and validated. Also includes details on Bioinformatic controls and quality checks. (DOC 131 kb)

Supplementary Data 1

Provides a detailed mathematical description of the statistical method we employ in this paper as well as details of coalescent simulations used to gauge robustness to demographic misspecification. (PDF 798 kb)

Supplementary Figure 1

Relationship between scaled McDonald–Kreitman cell entries and posterior mean of the selection coefficient γ for all genes in the INS data set. (PDF 1614 kb)

Supplementary Figure 2

Scatterplot of log-odds posterior of negative selection . (PDF 45 kb)

Supplementary Figure Legends

Text to accompany the above Supplementary Figures. (DOC 40 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Bustamante, C., Fledel-Alon, A., Williamson, S. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005). https://doi.org/10.1038/nature04240

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature04240

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing