Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

An integrated view of protein evolution

Key Points

  • Variations in the rate of protein evolution are determined by biases in the mutation rate and fixation rate (which are either protein specific or linked to genomic location).

  • By drawing on accumulating genomic data, evolutionary studies have moved from studying individual proteins to characterizing global cellular factors.

  • Protein-specific biases in fixation rate are due to differences in both purifying and positive selection across genes.

  • Although theoretical considerations that are based on purifying selection suggest that the importance of a gene (or its dispensability) is a key determinant of protein evolution, experimental data confirm at best a moderate influence.

  • An important concept in thinking about protein evolution is fitness density, that is, measuring the weighted fraction of sites at which mutations result in phenotypes with modified fitness.

  • Selection on protein structure and stability is presumably responsible for the largest contribution to fitness density.

  • The position of a protein in biological networks seems to be only of minor importance, despite much recent excitement.

  • Broadly expressed and highly expressed proteins evolve slowly; expression level is by far the strongest predictor of evolutionary rate in yeast (possibly because of selection for robust folding in highly expressed proteins).

  • Some recent studies suggest that a large fraction (30%) of amino-acid changes might be driven by positive selection, contrary to expectations that are based on the (nearly) neutral theory.

  • Positive selection often reflects compensatory mutations or arms races rather than adaptation.

  • Further research is needed to understand the relative importance of the different factors that affect protein evolution; future studies will be most effective if combined with the development of a coherent theory that is based on population genetics models.

Abstract

Why do proteins evolve at different rates? Advances in systems biology and genomics have facilitated a move from studying individual proteins to characterizing global cellular factors. Systematic surveys indicate that protein evolution is not determined exclusively by selection on protein structure and function, but is also affected by the genomic position of the encoding genes, their expression patterns, their position in biological networks and possibly their robustness to mistranslation. Recent work has allowed insights into the relative importance of these factors. We discuss the status of a much-needed coherent view that integrates studies on protein evolution with biochemistry and functional and structural genomics.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Distribution of mutation effects and evolutionary conservation across a DNA-repair enzyme.
Figure 2: Gene dispensability and rate of protein evolution.
Figure 3: Gene-expression level and rate of protein evolution.
Figure 4: Interdependence between the factors that affect protein evolution.

Similar content being viewed by others

References

  1. Webster, A. J., Payne, R. J. & Pagel, M. Molecular phylogenies link rates of evolution and speciation. Science 301, 478 (2003).

    Article  CAS  PubMed  Google Scholar 

  2. Cutter, A. D. & Ward, S. Sexual and temporal dynamics of molecular evolution in C. elegans development. Mol. Biol. Evol. 22, 178–188 (2005).

    Article  CAS  PubMed  Google Scholar 

  3. Bromham, L. & Leys, R. Sociality and the rate of molecular evolution. Mol. Biol. Evol. 22, 1393–1402 (2005).

    Article  PubMed  Google Scholar 

  4. Brakmann, S. & Schwienhorst, A. (eds) Evolutionary Methods in Biotechnology: Clever Tricks for Directed Evolution (Wiley, Weinheim, 2004).

    Book  Google Scholar 

  5. Smith, N. G. & Eyre-Walker, A. Human disease genes: patterns and predictions. Gene 318, 169–175 (2003).

    Article  CAS  PubMed  Google Scholar 

  6. Searls, D. B. Pharmacophylogenomics: genes, evolution and drug targets. Nature Rev. Drug Discov. 2, 613–623 (2003). A summary of the potential links between evolutionary genomics and pharmacology.

    Article  CAS  Google Scholar 

  7. Ramani, A. K. & Marcotte, E. M. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 327, 273–284 (2003).

    Article  CAS  PubMed  Google Scholar 

  8. Wilson, A. C., Carlson, S. S. & White, T. J. Biochemical evolution. Annu. Rev. Biochem. 46, 573–639 (1977). A classical early study that recognized several potential determinants of protein evolution.

    Article  CAS  PubMed  Google Scholar 

  9. Fay, J. C. & Wu, C. I. The neutral theory in the genomic era. Curr. Opin. Genet. Dev. 11, 642–646 (2001).

    Article  CAS  PubMed  Google Scholar 

  10. Kimura, M. The Neutral Theory of Evolution (Cambridge Univ. Press, Cambridge, 1983).

    Book  Google Scholar 

  11. Ohta, T. The nearly neutral theory of molecular evolution. Annu. Rev. Ecol. Syst. 23, 263–286 (1992).

    Article  Google Scholar 

  12. Gillespie, J. H. The Causes of Molecular Evolution (Oxford Univ. Press, Oxford, 1991). References 10–12 are landmark reviews (frequently with opposing views) on the neutral and nearly neutral theories.

  13. Ellegren, H., Smith, N. G. C. & Webster, M. T. Mutation rate variation in the mammalian genome. Curr. Opin. Genet. Dev. 13, 562–568 (2003).

    Article  CAS  PubMed  Google Scholar 

  14. Smith, N. G. C. & Hurst, L. D. The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents. Genetics 153, 1395–1402 (1999).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Lercher, M. J., Williams, E. J. B. & Hurst, L. D. Local similarity in evolutionary rates extends over whole chromosomes in human–rodent and mouse–rat comparisons: Implications for understanding the mechanistic basis of the male mutation bias. Mol. Biol. Evol. 18, 2032–2039 (2001). An analysis of mutation-rate variation across mammalian genomes and its effect on protein evolution.

    Article  CAS  PubMed  Google Scholar 

  16. Lercher, M. J., Chamary, J. V. & Hurst, L. D. Genomic regionality in rates of evolution is not explained by clustering of genes of comparable expression profile. Genome Res. 14, 1002–1013 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Williams, E. J. & Hurst, L. D. The proteins of linked genes evolve at similar rates. Nature 407, 900–903 (2000).

    Article  CAS  PubMed  Google Scholar 

  18. Matassi, G., Sharp, P. M. & Gautier, C. Chromosomal location effects on gene sequence evolution in mammals. Curr. Biol. 9, 786–791 (1999).

    Article  CAS  PubMed  Google Scholar 

  19. Datta, A. & Jinks-Robertson, S. Association of increased spontaneous mutation-rates with high levels of transcription in yeast. Science 268, 1616–1619 (1995).

    Article  CAS  PubMed  Google Scholar 

  20. Lercher, M. J. & Hurst, L. D. Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet. 18, 337–340 (2002).

    Article  CAS  PubMed  Google Scholar 

  21. Rattray, A. J. & Strathern, J. N. Error-prone DNA polymerases: when making a mistake is the only way to get ahead. Annu. Rev. Genet. 37, 31–66 (2003).

    Article  CAS  PubMed  Google Scholar 

  22. Hurst, L. D. & Peck, J. R. Recent advances in understanding the evolution and maintenance of sex. Trends Ecol. Evol. 11, 46–52 (1996).

    Article  CAS  PubMed  Google Scholar 

  23. Birky, C. W. Jr & Walsh, J. B. Effects of linkage on rates of molecular evolution. Proc. Natl Acad. Sci. USA 85, 6414–6418 (1988).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Wyckoff, G. J., Malcom, C. M., Vallender, E. J. & Lahn, B. T. A highly unexpected strong correlation between fixation probability of nonsynonymous mutations and mutation rate. Trends Genet. 21, 381–385 (2005). A remarkable study that suggests that up to 40% of the variation in protein evolutionary rates might be attributable to variation in the underlying mutation rate.

    Article  CAS  PubMed  Google Scholar 

  25. Chamary, J. V., Parmley, J. L. & Hurst, L. D. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nature Rev. Genet. 7, 98–108 (2006).

    Article  CAS  PubMed  Google Scholar 

  26. Smith, J. M. & Haigh, J. The hitch-hiking effect of a favourable gene. Genet. Res. 23, 23–35 (1974).

    Article  CAS  PubMed  Google Scholar 

  27. Betancourt, A. J. & Presgraves, D. C. Linkage limits the power of natural selection in Drosophila. Proc. Natl Acad. Sci. USA 99, 13616–13620 (2002). This paper claims that regional recombinational differences have a strong influence on the fixation of positively selected mutations.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Bierne, N. & Eyre-Walker, A. The genomic rate of adaptive amino acid substitution in Drosophila. Mol. Biol. Evol. 21, 1350–1360 (2004).

    Article  CAS  PubMed  Google Scholar 

  29. Presgraves, D. C. Recombination enhances protein adaptation in Drosophila melanogaster. Curr. Biol. 15, 1651–1656 (2005).

    Article  CAS  PubMed  Google Scholar 

  30. Subramanian, S. & Kumar, S. Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics 168, 373–381 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Wright, S. I., Yau, C. B., Looseley, M. & Meyers, B. C. Effects of gene expression on molecular evolution in Arabidopsis thaliana and Arabidopsis lyrata. Mol. Biol. Evol. 21, 1719–1726 (2004).

    Article  CAS  PubMed  Google Scholar 

  32. Pal, C., Papp, B. & Hurst, L. D. Highly expressed genes in yeast evolve slowly. Genetics 158, 927–931 (2001). The first identification of protein-expression level as a strong predictor of evolutionary rate in yeast.

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Rocha, E. P. C. & Danchin, A. An analysis of determinants of amino acids substitution rates in bacterial proteins. Mol. Biol. Evol. 21, 108–116 (2004). This work (like reference 48) compares the relative importance of several factors that are implicated in protein evolution, identifying expression level as the most important variable.

    Article  CAS  PubMed  Google Scholar 

  34. Gerton, J. L. et al. Inaugural article: global mapping of meiotic recombination hotspots and coldspots in the yeast Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 97, 11383–11390 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Pal, C., Papp, B. & Hurst, L. D. Does the recombination rate affect the efficiency of purifying selection? The yeast genome provides a partial answer. Mol. Biol. Evol. 18, 2323–2326 (2001).

    Article  CAS  PubMed  Google Scholar 

  36. Bachtrog, D. Protein evolution and codon usage bias on the neo-sex chromosomes of Drosophila miranda. Genetics 165, 1221–1232 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  37. Bachtrog, D. Evidence that positive selection drives Y-chromosome degeneration in Drosophila miranda. Nature Genet. 36, 518–522 (2004).

    Article  CAS  PubMed  Google Scholar 

  38. Zuckerkandl, E. Evolutionary processes and evolutionary noise at the molecular level. I. Functional density in proteins. J. Mol. Evol. 7, 167–183 (1976).

    Article  CAS  PubMed  Google Scholar 

  39. Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proc. Natl Acad. Sci. USA 102, 14338–14343 (2005). Might highly expressed proteins be under strong selection to avoid protein misfolding? Several tests in this remarkable study indicate that this is the case.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky–Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–13883 (2002). An original study on the frequency and importance of compensatory substitutions.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Rev. Genet. 6, 678–687 (2005). An original and thought-provoking review that links protein stability and compensatory evolution.

    Article  CAS  PubMed  Google Scholar 

  42. Poon, A., Davis, B. H. & Chao, L. The coupon collector and the suppressor mutation: estimating the number of compensatory mutations by maximum likelihood. Genetics 170, 1323–1332 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Hirsh, A. E. & Fraser, H. B. Protein dispensability and rate of evolution. Nature 411, 1046–1049 (2001). A classical study on the effect of gene 'importance' on protein evolution.

    Article  CAS  PubMed  Google Scholar 

  44. Jordan, I. K., Rogozin, I. B., Wolf, Y. I. & Koonin, E. V. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Cutter, A. D. et al. Molecular correlates of genes exhibiting RNAi phenotypes in Caenorhabditis elegans. Genome Res. 13, 2651–2657 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Pal, C., Papp, B. & Hurst, L. D. Rate of evolution and gene dispensability. Nature 421, 496–497 (2003).

    Article  CAS  PubMed  Google Scholar 

  47. Wall, D. P. et al. Functional genomic analysis of the rates of protein evolution. Proc. Natl Acad. Sci. USA 102, 5483–5488 (2005). A sophisticted analysis that aims to disentangle the influences of expression level and dispensability.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Drummond, D. A., Raval, A. & Wilke, C. O. A single determinant dominates the rate of yeast protein evolution. Mol. Biol. Evol., 327–337 (2005). This work (like reference 33) compares the relative importance of several factors that are implicated in protein evolution, and identifies expression level as the most important variable.

  49. Zhang, J. Z. & He, X. L. Significant impact of protein dispensability on the instantaneous rate of protein evolution. Mol. Biol. Evol. 22, 1147–1155 (2005).

    Article  CAS  PubMed  Google Scholar 

  50. Papp, B., Pal, C. & Hurst, L. D. Metabolic network analysis of the causes and evolution of enzyme dispensability in yeast. Nature 429, 661–664 (2004). The 'importance' of a gene is highly environment-specific: about half of all 'dispensable' enzymes in the laboratory are essential in specific environments.

    Article  CAS  PubMed  Google Scholar 

  51. Krylov, D. M., Wolf, Y. I., Rogozin, I. B. & Koonin, E. V. Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 13, 2229–2235 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Hurst, L. D. & Smith, N. G. Do essential genes evolve slowly? Curr. Biol. 9, 747–750 (1999).

    Article  CAS  PubMed  Google Scholar 

  53. Torgerson, D. G., Whitty, B. R. & Singh, R. S. Sex-specific functional specialization and the evolutionary rates of essential fertility genes. J. Mol. Evol. 61, 650–658 (2005). Shows that function-specific positive selection, rather than essentiality, seems to explain the evolution of fertility genes.

    Article  CAS  PubMed  Google Scholar 

  54. Pakula, A. A. & Sauer, R. T. Genetic analysis of protein stability and function. Annu. Rev. Genet. 23, 289–310 (1989).

    Article  CAS  PubMed  Google Scholar 

  55. Guo, H. H., Choe, J. & Loeb, L. A. Protein tolerance to random amino acid change. Proc. Natl Acad. Sci. USA 101, 9205–9210 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Dobson, C. M. Principles of protein folding, misfolding and aggregation. Semin. Cell Dev. Biol. 15, 3–16 (2004).

    Article  CAS  PubMed  Google Scholar 

  57. Haney, P. J. et al. Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proc. Natl Acad. Sci. USA 96, 3578–3583 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Sterner, R. & Liebl, W. Thermophilic adaptation of proteins. Crit. Rev. Biochem. Mol. Biol. 36, 39–106 (2001).

    Article  CAS  PubMed  Google Scholar 

  59. Dokholyan, N. V. & Shakhnovich, E. I. Understanding hierarchical protein evolution from first principles. J. Mol. Biol. 312, 289–307 (2001).

    Article  CAS  PubMed  Google Scholar 

  60. Parisi, G. & Echave, J. Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. Gene 345, 45–53 (2005).

    Article  CAS  PubMed  Google Scholar 

  61. Dean, A. M., Neuhauser, C., Grenier, E. & Golding, G. B. The pattern of amino acid replacements in α/β-barrels. Mol. Biol. Evol. 19, 1846–1864 (2002).

    Article  CAS  PubMed  Google Scholar 

  62. Goldman, N., Thorne, J. L. & Jones, D. T. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149, 445–458 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  63. Bustamante, C. D., Townsend, J. P. & Hartl, D. L. Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol. Biol. Evol. 17, 301–308 (2000).

    Article  CAS  PubMed  Google Scholar 

  64. Koehl, P. & Levitt, M. Protein topology and stability define the space of allowed sequences. Proc. Natl Acad. Sci. USA 99, 1280–1285 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Aris-Brosou, S. Determinants of adaptive evolution at the molecular level: the extended complexity hypothesis. Mol. Biol. Evol. 22, 200–209 (2005).

    Article  CAS  PubMed  Google Scholar 

  66. Fisher, R. The Genetical Theory of Natural Selection (Dover, New York, 1958).

    Google Scholar 

  67. Orr, H. A. The genetic theory of adaptation: a brief history. Nature Rev. Genet. 6, 119–127 (2005). An excellent review on molecular adaptation.

    Article  CAS  PubMed  Google Scholar 

  68. Fraser, H. B., Hirsh, A. E., Steinmetz, L. M., Scharfe, C. & Feldman, M. W. Evolutionary rate in the protein interaction network. Science 296, 750–752 (2002). An influential, but controversial study on the effect of protein interactions on evolution.

    Article  CAS  PubMed  Google Scholar 

  69. Bloom, J. D. & Adami, C. Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein–protein interactions data sets. BMC Evol. Biol. 3, 21 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Hahn, M. W., Conant, G. C. & Wagner, A. Molecular evolution in large genetic networks: does connectivity equal constraint? J. Mol. Evol. 58, 203–211 (2004).

    Article  CAS  PubMed  Google Scholar 

  71. Jordan, I. K., Wolf, Y. I. & Koonin, E. V. No simple dependence between protein evolution rate and the number of protein–protein interactions: only the most prolific interactors tend to evolve slowly. BMC Evol. Biol. 3, 1 (2003).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Agrafioti, I. et al. Comparative analysis of the Saccharomyces cerevisiae and Caenorhabditis elegans protein interaction networks. BMC Evol. Biol. 5, 23 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Teichmann, S. A. The constraints protein–protein interactions place on sequence divergence. J. Mol. Biol. 324, 399–407 (2002).

    Article  CAS  PubMed  Google Scholar 

  74. Mintseris, J. & Weng, Z. Structure, function, and evolution of transient and obligate protein–protein interactions. Proc. Natl Acad. Sci. USA 102, 10930–10935 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Makino, T. & Gojobori, T. The evolutionary rate of a protein is influenced by features of the interacting partners. Mol. Biol. Evol. 23, 784–789 (2006).

    Article  CAS  PubMed  Google Scholar 

  76. Fraser, H. B. Modularity and evolutionary constraint on proteins. Nature Genet. 37, 351–352 (2005).

    Article  CAS  PubMed  Google Scholar 

  77. Jordan, I. K., Marino-Ramirez, L., Wolf, Y. I. & Koonin, E. V. Conservation and coevolution in the scale-free human gene coexpression network. Mol. Biol. Evol. 21, 2058–2070 (2004).

    Article  CAS  PubMed  Google Scholar 

  78. Evangelisti, A. M. & Wagner, A. Molecular evolution in the yeast transcriptional regulation network. J. Exp. Zool. B 302, 392–411 (2004).

    Article  CAS  Google Scholar 

  79. Salathe, M., Ackermann, M. & Bonhoeffer, S. The effect of multi-functionality on the rate of evolution in yeast. Mol. Biol. Evol. 23, 721–722 (2006).

    Article  CAS  PubMed  Google Scholar 

  80. Mizokami, M. et al. Constrained evolution with respect to gene overlap of hepatitis B virus. J. Mol. Evol. 44 (Suppl. 1), 83–90 (1997).

    Article  Google Scholar 

  81. Raff, R. The Shape of Life (Univ. Chicago Press, Chicago, 1996).

    Book  Google Scholar 

  82. Davis, J. C., Brandman, O. & Petrov, D. A. Protein evolution in the context of Drosophila development. J. Mol. Evol. 60, 774–785 (2005).

    Article  CAS  PubMed  Google Scholar 

  83. Hazkani-Covo, E., Wool, D. & Graur, D. In search of the vertebrate phylotypic stage: a molecular examination of the developmental hourglass model and von Baer's third law. J. Exp. Zool. B 304, 150–158 (2005). In agreement with the 'hourglass' model of animal development, genes that are expressed during the phylotypic stage are under strong stabilizing selection.

    Article  Google Scholar 

  84. Castillo-Davis, C. I., Kondrashov, F. A., Hartl, D. L. & Kulathinal, R. J. The functional genomic distribution of protein divergence in two animal phyla: coevolution, genomic conflict, and constraint. Genome Res. 14, 802–811 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Good, J. M. & Nachman, M. W. Rates of protein evolution are positively correlated with developmental timing of expression during mouse spermatogenesis. Mol. Biol. Evol. 22, 1044–1052 (2005).

    Article  CAS  PubMed  Google Scholar 

  86. Duret, L. & Mouchiroud, D. Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol. Biol. Evol. 17, 68–74 (2000). The first demonstration of faster evolution of tissue-specific proteins.

    Article  CAS  PubMed  Google Scholar 

  87. Xing, Y. & Lee, C. Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences. Proc. Natl Acad. Sci. USA 102, 13526–13531 (2005). Shows that exons that are used in minor isoform proteins evolve at higher rates than constitutive exons.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Akashi, H. & Gojobori, T. Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl Acad. Sci. USA 99, 3695–3700 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Akashi, H. Translational selection and yeast proteome evolution. Genetics 164, 1291–1303 (2003).

    CAS  PubMed  PubMed Central  Google Scholar 

  90. Fay, J. C., Wyckoff, G. J. & Wu, C. I. Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415, 1024–1026 (2002).

    Article  CAS  PubMed  Google Scholar 

  91. Wagner, A. Robustness, evolvability, and neutrality. FEBS Lett. 579, 1772–1778 (2005).

    Article  CAS  PubMed  Google Scholar 

  92. Nielsen, R. et al. A scan for positively selected genes in the genomes of humans and chimpanzees. PLoS Biol. 3, e170 (2005). A comprehensive overview of the gene classes that were shaped by positive selection in human evolutionary history (see also reference 100).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Wichman, H. A., Millstein, J. & Bull, J. J. Adaptive molecular evolution for 13,000 phage generations: a possible arms race. Genetics 170, 19–31 (2005). This work indicates that intraspecies competition might lead to selection for perpetual change.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Zhang, Z., Hambuch, T. M. & Parsch, J. Molecular evolution of sex-biased genes in Drosophila. Mol. Biol. Evol. 21, 2130–2139 (2004).

    Article  CAS  PubMed  Google Scholar 

  95. Poon, A. & Chao, L. The rate of compensatory mutation in the DNA bacteriophage φX174. Genetics 170, 989–999 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Fares, M. A., Moya, A. & Barrio, E. Adaptive evolution in GroEL from distantly related endosymbiotic bacteria of insects. J. Evol. Biol. 18, 651–660 (2005). This paper (along with others from the same group) indicates that a heat-shock protein might have evolved to mitigate the effect of deleterious substitutions in endosymbionts.

    Article  CAS  PubMed  Google Scholar 

  97. Shim Choi, S., Li, W. & Lahn, B. T. Robust signals of coevolution of interacting residues in mammalian proteomes identified by phylogeny-aided structural analysis. Nature Genet. 37, 1367–1371 (2005).

    Article  CAS  Google Scholar 

  98. Fisher, S. E. & Marcus, G. F. The eloquent ape: genes, brains and the evolution of language. Nature Rev. Genet. 7, 9–20 (2006).

    Article  CAS  PubMed  Google Scholar 

  99. Mekel-Bobrov, N. et al. Ongoing adaptive evolution of ASPM, a brain size determinant in Homo sapiens. Science 309, 1720–1722 (2005).

    Article  CAS  PubMed  Google Scholar 

  100. Bustamante, C. D. et al. Natural selection on protein-coding genes in the human genome. Nature 437, 1153–1157 (2005). A comprehensive overview of the gene classes that were shaped by positive selection in human evolutionary history (see also reference 92).

    Article  CAS  PubMed  Google Scholar 

  101. Koonin, E. V. Systemic determinants of gene evolution and function. Mol. Syst. Biol. 13 Sep 2005 (doi:10.1038/msb4100029).

  102. Chen, Y. & Xu, D. Understanding protein dispensability through machine-learning analysis of high-throughput data. Bioinformatics 21, 575–581 (2005).

    Article  CAS  PubMed  Google Scholar 

  103. Kondrashov, F. A., Ogurtsov, A. Y. & Kondrashov, A. S. Bioinformatical assay of human gene morbidity. Nucleic Acids Res. 32, 1731–1737 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Wolf, Y. I., Carmel, L. & Koonin, E. V. Unifying measures of gene function and evolution. Proc. R. Soc. B (in the press).

  105. Elena, S. F. & Lenski, R. E. Evolution experiments with microorganisms: the dynamics and genetic bases of adaptation. Nature Rev. Genet. 4, 457–469 (2003).

    Article  CAS  PubMed  Google Scholar 

  106. Shendure, J. et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309, 1728–1732 (2005).

    Article  CAS  PubMed  Google Scholar 

  107. Patthy, L. Protein Evolution (Blackwell Science, Oxford, 1999).

    Google Scholar 

  108. Papp, B., Pal, C. & Hurst, L. D. Dosage sensitivity and the evolution of gene families in yeast. Nature 424, 194–197. (2003).

    Article  CAS  PubMed  Google Scholar 

  109. Jain, R., Rivera, M. C. & Lake, J. A. Horizontal gene transfer among genomes: the complexity hypothesis. Proc. Natl Acad. Sci. USA 96, 3801–3806 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  110. Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, Oxford, 2000).

    Google Scholar 

  111. Whelan, S., Lio, P. & Goldman, N. Molecular phylogenetics: state-of-the-art methods for looking into the past. Trends Genet. 17, 262–272 (2001).

    Article  CAS  PubMed  Google Scholar 

  112. Abascal, F., Zardoya, R. & Posada, D. ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21, 2104–2105 (2005).

    Article  CAS  PubMed  Google Scholar 

  113. Posada, D. & Crandall, K. A. MODELTEST: testing the model of DNA substitution. Bioinformatics 14, 817–818 (1998).

    Article  CAS  PubMed  Google Scholar 

  114. Goldman, N. & Yang, Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994).

    CAS  PubMed  Google Scholar 

  115. Miller, M. P. & Kumar, S. Understanding human disease mutations through the use of interspecific genetic variation. Hum. Mol. Genet. 10, 2319–2328 (2001).

    Article  CAS  PubMed  Google Scholar 

  116. Ng, P. C. & Henikoff, S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 31, 3812–3814 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  117. Ramensky, V., Bork, P. & Sunyaev, S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 30, 3894–3900 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Rebbeck, T. R., Spitz, M. & Wu, X. F. Assessing the function of genetic variants in candidate gene association studies. Nature Rev. Genet. 5, 589–597 (2004).

    Article  CAS  PubMed  Google Scholar 

  119. Piganeau, G. & Eyre-Walker, A. Estimating the distribution of fitness effects from DNA sequence data: implications for the molecular clock. Proc. Natl Acad. Sci. USA 100, 10335–10340 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  120. Loewe, L., Charlesworth, B., Bartolome, C. & Noel, V. Estimating selection on non-synonymous mutations. Genetics 172, 1079–1092 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  121. Rokyta, D. R., Joyce, P., Caudle, S. B. & Wichman, H. A. An empirical test of the mutational landscape model of adaptation using a single-stranded DNA virus. Nature Genet. 37, 441–444 (2005). References 119–121 attempt to estimate the fitness distribution of mutations; these values are highly relevant to understanding the relative influence of deleterious and advantageous mutations on protein evolution.

    Article  CAS  PubMed  Google Scholar 

  122. Aharoni, A. et al. The 'evolvability' of promiscuous protein functions. Nature Genet. 37, 73–76 (2005).

    Article  CAS  PubMed  Google Scholar 

  123. Davis, J. C. & Petrov, D. A. Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol. 2, 318–326 (2004).

    Article  Google Scholar 

  124. Jordan, I. K., Wolf, Y. I. & Koonin, E. V. Duplicated genes evolve slower than singletons despite the initial rate increase. BMC Evol. Biol. 4, 22 (2004).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Cusack, B. P. & Wolfe, K. H. Changes in alternative splicing of human and mouse genes are accompanied by faster evolution of constitutive exons. Mol. Biol. Evol. 22, 2198–2208 (2005).

    Article  CAS  PubMed  Google Scholar 

  126. Kondrashov, F. A., Rogozin, I. B., Wolf, Y. I. & Koonin, E. V. Selection in the evolution of gene duplications. Genome Biol. 3, RESEARCH0008 (2002). Shows that selection pressure is relaxed for a short period after gene duplication.

  127. Kumar, S. Molecular clocks: four decades of evolution. Nature Rev. Genet. 6, 654–662 (2005). A comprehensive overview of the reasons for evolutionary rate variation across species.

    Article  CAS  PubMed  Google Scholar 

  128. Gillooly, J. F., Allen, A. P., West, G. B. & Brown, J. H. The rate of DNA evolution: effects of body size and temperature on the molecular clock. Proc. Natl Acad. Sci. USA 102, 140–145 (2005).

    Article  CAS  PubMed  Google Scholar 

  129. Wernegreen, J. J. Genome evolution in bacterial endosymbionts of insects. Nature Rev. Genet. 3, 850–861 (2002).

    Article  CAS  PubMed  Google Scholar 

  130. Gillespie, J. H. The role of population size in molecular evolution. Theor. Popul. Biol. 55, 145–156 (1999).

    Article  CAS  PubMed  Google Scholar 

  131. Eyre-Walker, A., Keightley, P. D., Smith, N. G. & Gaffney, D. Quantifying the slightly deleterious mutation model of molecular evolution. Mol. Biol. Evol. 19, 2142–2149 (2002).

    Article  CAS  PubMed  Google Scholar 

  132. Paland, S. & Lynch, M. Transitions to asexuality result in excess amino acid substitutions. Science 311, 990–992 (2006).

    Article  CAS  PubMed  Google Scholar 

  133. Bustamante, C. D. et al. The cost of inbreeding in Arabidopsis. Nature 416, 531–534 (2002). References 132 and 133 show the effect of sex and breeding system on the accumulation of deleterious substitutions.

    Article  CAS  PubMed  Google Scholar 

  134. Bastolla, U., Porto, M., Eduardo Roman, M. H. & Vendruscolo, M. H. Connectivity of neutral networks, overdispersion, and structural conservation in protein evolution. J. Mol. Evol. 56, 243–254 (2003).

    Article  CAS  PubMed  Google Scholar 

  135. Glaser, F. et al. ConSurf: identification of functional regions in proteins by surface-mapping of phylogenetic information. Bioinformatics 19, 163–164 (2003).

    Article  CAS  PubMed  Google Scholar 

  136. Deutschbauer, A. M. et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169, 1915–1925 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  137. Holstege, F. C. et al. Dissecting the regulatory circuitry of a eukaryotic genome. Cell 95, 717–728 (1998).

    Article  CAS  PubMed  Google Scholar 

  138. Wright, B. E., Longacre, A. & Reimers, J. M. Hypermutation in derepressed operons of Escherichia coli K12. Proc. Natl Acad. Sci. USA 96, 5089–5094 (1999).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. Pal, C. & Hurst, L. D. Evidence for co-evolution of gene order and recombination rate. Nature Genet. 33, 392–395 (2003).

    Article  CAS  PubMed  Google Scholar 

  140. von Mering, C. et al. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 417, 399–403 (2002).

    Article  CAS  PubMed  Google Scholar 

  141. Jeong, H., Mason, S. P., Barabasi, A. L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001).

    Article  CAS  PubMed  Google Scholar 

  142. Coulomb, S., Bauer, M., Bernard, D. & Marsolier-Kergoat, M. C. Gene essentiality and the topology of protein interaction networks. Proc. Biol. Sci. 272, 1721–1725 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors wish to thank L. Hurst and L. Loewe for their insightful comments. C.P. and B.P. are supported by the Hungarian Scientific Research Fund (OTKA). C.P. is also supported by an EMBO (European Molecular Biology Organization) Long-term Fellowship. B.P. is a fellow of the Human Frontier Science Program. M.J.L. acknowledges financial support from the DFG (Deutsche Forschungsgemeinschaft).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin J. Lercher.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Related links

Related links

FURTHER INFORMATION

Genome-wide Analysis papers

Joe Felsenstein's web page of Phylogeny Programs

Martin Lercher's web page

MEGA — Molecular Evolutionary Genetics Analysis

PolyPhen — Prediction of Functional Effect of Human nsSNPs

Protein Data Bank

PyMOL homepage

Sorting Intolerant From Tolerant (SIFT) database

The ConSurf server

UniProtKB

Glossary

Genetic drift

The stochastic changes in allele frequencies in a population that occur owing to random sampling effects in the formation of successive generations.

Purifying selection

The removal of a deleterious genetic variant from the population owing to the reduced reproductive success of its carriers.

Positive selection

The accelerated spread of a beneficial genetic variant in the population owing to the increased reproductive success of its carriers.

Dispensability

A measure that is inversely related to the overall importance of a gene. It is usually approximated by the fitness (or growth rate) of the corresponding gene knockout strain under various laboratory conditions.

Transition matrix

A matrix that contains the probabilities of each type of amino-acid substitution for a given period of evolution.

Maximum-likelihood framework

A method that takes a model (for example, of sequence evolution) and searches for the combination of parameter values that best describes the observed data (for example, the aligned sequences).

Synonymous (change)

A nucleotide change in the protein-coding region of a gene that leaves the encoded amino acid unchanged.

Nearly neutral (mutation)

A mutation is nearly neutral when its fitness effect is too small to be governed only by selection, and so its fate is determined largely by genetic drift.

Non-synonymous (change)

A nucleotide change in the protein-coding region of a gene that alters the encoded amino acid.

Interference (Hill–Robertson effects)

A phenomenon that describes a reduction in the efficiency at which selection functions simultaneously at genetically linked sites, especially in regions of low recombination.

Fitness density

The proportion of residues in a protein that are under natural selection, with the contribution of each site weighted by the fitness effects of mutations. Besides functional requirements, selection can favour many fitness components, including stability and robustness against errors. Therefore, fitness density is expected to be higher than functional density.

Imprinted gene

A gene in which expression is determined by the parent from which it is inherited.

Effective population size

The number of individuals in a population that contribute to the next generation. It is generally much smaller than the number of individuals in the population, and is influenced by factors that include population structure, sex ratio, mating system and age distribution.

Essential protein

One for which deletion of the encoding gene results in a lethal phenotype, which is usually measured under laboratory conditions.

Orthologous

Proteins that are encoded by genes that evolved from a common ancestral gene through speciation.

Protein designability

The number of possible amino-acid sequences that are compatible with a given protein structure.

Overdispersion

When the variance in the substitution rate across lineages exceeds its mean. This indicates that the substitution process does not follow a Poisson distribution.

Module

A discrete entity that is isolated through spatial localization, gene-expression pattern, chemical specificity or position in biological network (for example, protein complex, metabolic or signal-transduction pathways). Ideally, the biological function of a module is separable from that of other modules.

Overlapping reading frames

Adjacent protein-coding genes that share one or more nucleotides.

Sexual selection

Competition among members of one sex for mating opportunities with the other sex.

Gene conversion

Non-reciprocal transfer between a pair of non-allelic or allelic DNA sequences during meiosis and mitosis, such that the receiving sequence becomes more similar to the donating sequence.

Codon usage bias

The non-random usage of synonymous codons for the same amino acid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pál, C., Papp, B. & Lercher, M. An integrated view of protein evolution. Nat Rev Genet 7, 337–348 (2006). https://doi.org/10.1038/nrg1838

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1038/nrg1838

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing