Causes of evolutionary rate variation among protein sites

Key Points

  • The rate of evolution varies among sites within proteins owing to structural and functional constraints.

  • The main pattern of variation is due to structural constraints: evolutionary rates increase from the slowly evolving, solvent-inaccessible, tightly packed and rigid protein interior, to the rapidly evolving, solvent-exposed and loosely packed protein surface.

  • Functional constraints result in the slow evolution of sites that are directly involved in protein function and their neighbours. There may also be longer range effects on distant sites.

  • According to mechanistic biophysical models, site-specific evolutionary rates are related to mutational changes of thermodynamic stability. Structural predictors, such as solvent accessibility and local packing, would be proxies of mutational stability changes.

  • Our understanding of rate variation among sites remains limited: at best, current models explain approximately 60% of the observed variance in site-specific rates, and in many cases these models explain considerably less.

  • To make further progress, we need to develop better rate inference methods, complete the list of structural and functional molecular features that correlate with rates, and undertake further research on theoretical models derived from first principles.


It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Structural and functional constraints shape site-specific evolutionary divergence.
Figure 2: WCN correlates more strongly than RSA with site-specific rate.
Figure 3: Predictors of evolutionary variation can help to identify important sites in a protein.
Figure 4: The trade-off between native stability and active stability.

Accession codes


Protein Data Bank


  1. 1

    Zhang, J. & Yang, J.-R. Determinants of the rate of protein sequence evolution. Nat. Rev. Genet. 16, 409–420 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. 2

    Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).

    CAS  PubMed  Google Scholar 

  3. 3

    Yang, Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11, 367–372 (1996).

    CAS  PubMed  Google Scholar 

  4. 4

    Lartillot, N. & Phillipe, H. A. Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004).

    CAS  PubMed  Google Scholar 

  5. 5

    Yang, Z. Computational Molecular Evolution (Oxford Univ. Press, 2006).

    Google Scholar 

  6. 6

    Holder, M. T., Zwickl, D. J. & Dessimoz, C. Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes. Phil. Trans. R. Soc. B 363, 4013–4021 (2008).

    CAS  PubMed  Google Scholar 

  7. 7

    Wang, H. C., Li, K., Susko, E. & Roger, A. J. A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny. BMC Evol. Biol. 8, 331 (2008).

    PubMed  PubMed Central  Google Scholar 

  8. 8

    Le, S. Q., Dang, C. C. & Gascuel, O. Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol. Biol. Evol. 29, 2921–2936 (2012).

    CAS  PubMed  Google Scholar 

  9. 9

    Yang, Z. H., Nielsen, R., Goldman, N. & Pedersen, A. M. K. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. 10

    Buckley, T. R., Simon, C. & Chambers, G. K. Exploring among-site rate variation models in a maximum likelihood framework using empirical data: effects of model assumptions on estimates of topology, branch lengths, and bootstrap support. Syst. Biol. 50, 67–86 (2001).

    CAS  PubMed  Google Scholar 

  11. 11

    Mayrose, I., Friedman, N. & Pupko, T. A gamma mixture model better accounts for among site rate heterogeneity. Bioinformatics 21, ii151–ii158 (2005).

    CAS  PubMed  Google Scholar 

  12. 12

    Delport, W., Scheffler, K., Gravenor, M. B., Muse, S. V. & Kosakovsky Pond, S. L. Benchmarking multi-rate codon models. PLoS ONE 5, e11587 (2010).

    PubMed  PubMed Central  Google Scholar 

  13. 13

    Lartillot, N. Probabilistic models of eukaryotic evolution: time for integration. Phil. Trans. R. Soc. B 370, 20140338 (2015).

    PubMed  Google Scholar 

  14. 14

    Liberles, D. A., Teufel, A. I., Liu, L. & Stadler, T. On the need for mechanistic models in computational genomics and metagenomics. Genome Biol. Evol. 5, 2008–2018 (2013).

    PubMed  PubMed Central  Google Scholar 

  15. 15

    Perutz, M. F., Kendrew, J. C. & Watson, H. C. Structure and function of haemoglobin: II. Some relations between polypeptide chain configuration and amino acid sequence. J. Mol. Biol. 13, 669–678 (1965).

    CAS  Google Scholar 

  16. 16

    Kimura, M. & Ota, T. On some principles governing molecular evolution. Proc. Natl Acad. Sci. USA 71, 2848–2852 (1974).

    CAS  PubMed  Google Scholar 

  17. 17

    Dean, A. M., Neuhauser, C., Grenier, E. & Golding, G. B. The pattern of amino acid replacements in α/β-barrels. Mol. Biol. Evol. 19, 1846–1864 (2002). One of the first studies to consider both structural and functional determinants of site-specific amino acid substitution rates.

    CAS  PubMed  Google Scholar 

  18. 18

    Franzosa, E. A. & Xia, Y. Structural determinants of protein evolution are context-sensitive at the residue level. Mol. Biol. Evol. 26, 2387–2395 (2009). This landmark study found that that site-specific rate ( dN/dS ) increases linearly with solvent accessibility in yeast.

    CAS  PubMed  Google Scholar 

  19. 19

    Shih, C.-H. & Hwang, J.-K. Evolutionary information hidden in a single protein structure. Proteins 80, 1647–1657 (2012).

    CAS  PubMed  Google Scholar 

  20. 20

    Nevin Gerek, Z., Kumar, S. & Banu Ozkan, S. Structural dynamics flexibility informs function and evolution at a proteome scale. Evol. Appl. 6, 423–433 (2013).

    PubMed  PubMed Central  Google Scholar 

  21. 21

    Marsh, J. A. & Teichmann, S. A. Parallel dynamics and evolution: protein conformational fluctuations and assembly reflect evolutionary changes in sequence and structure. BioEssays 36, 209–218 (2014).

    CAS  PubMed  Google Scholar 

  22. 22

    Shahmoradi, A. et al. Predicting evolutionary site variability from structure in viral proteins: buriedness, packing, flexibility, and design. J. Mol. Evol. 79, 130–142 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Yeh, S.-W. et al. Site-specific structural constraints on protein sequence evolutionary divergence: local packing density versus solvent exposure. Mol. Biol. Evol. 31, 135–139 (2014). First study showing that site-specific rates correlate more strongly with WCN than with RSA.

    CAS  PubMed  Google Scholar 

  24. 24

    Huang, T.-T., Del Valle Marcos, M. L., Hwang, J.-K. & Echave, J. A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility. BMC Evol. Biol. 14, 78 (2014). This paper introduces the stress model of protein evolution, a biophysical model based on mutational changes of active-state stability.

    PubMed  PubMed Central  Google Scholar 

  25. 25

    Echave, J., Jackson, E. L. & Wilke, C. O. Relationship between protein thermodynamic constraints and variation of evolutionary rates among sites. Phys. Biol. 12, 025002 (2015). Study of rate variation among sites using the native-stability threshold biophysical model.

    PubMed  PubMed Central  Google Scholar 

  26. 26

    Meyer, A. G., Spielman, S. J., Bedford, T. & Wilke, C. O. Time dependence of evolutionary metrics during the 2009 pandemic influenza virus outbreak. Virus Evol. 1, vev006–vev010 (2015).

    PubMed  PubMed Central  Google Scholar 

  27. 27

    Nielsen, R. Mapping mutations on phylogenies. Syst. Biol. 51, 729–739 (2002).

    PubMed  Google Scholar 

  28. 28

    Kosakovsky Pond, S. L. & Frost, S. D. W. A simple hierarchical approach to modeling distributions of substitution rates. Mol. Biol. Evol. 22, 223–234 (2004).

    Google Scholar 

  29. 29

    Kosakovsky Pond, S. L. & Frost, S. D. W. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol. 22, 1208–1222 (2005). Landmark paper benchmarking different methods of site-specific rate inference.

    PubMed  Google Scholar 

  30. 30

    Lemey, P., Minin, V. N., Bielejec, F., Kosakovsky Pond, S. L. & Suchard, M. A. A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection. Bioinformatics 28, 3248–3256 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  31. 31

    Rodrigue, N. On the statistical interpretation of site-specific variables in phylogeny-based substitution models. Genetics 193, 557–564 (2013).

    PubMed  PubMed Central  Google Scholar 

  32. 32

    Valdar, W. S. Scoring residue conservation. Proteins 48, 227–241 (2002).

    CAS  PubMed  Google Scholar 

  33. 33

    Capra, J. A. & Singh, M. Predicting functionally important residues from sequence conservation. Bioinformatics 23, 1875–1882 (2007).

    CAS  PubMed  Google Scholar 

  34. 34

    Johansson, F. & Toh, H. A comparative study of conservation and variation scores. BMC Bioinformatics 11, 311–388 (2010).

    Google Scholar 

  35. 35

    Muse, S. V. Estimating synonymous and nonsynonymous substitution rates. Mol. Biol. Evol. 13, 105–114 (1996).

    CAS  PubMed  Google Scholar 

  36. 36

    Meyer, A. G. & Wilke, C. O. Integrating sequence variation and protein structure to identify sites under selection. Mol. Biol. Evol. 30, 36–44 (2013).

    CAS  PubMed  Google Scholar 

  37. 37

    Li, W.-H., Wu, C.-I. & Luo, C.-C. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution consider the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2, 150–174 (1985).

    PubMed  Google Scholar 

  38. 38

    Nei, M. & Gojobori, T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 (1986).

    CAS  PubMed  Google Scholar 

  39. 39

    Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32–42 (2000).

    CAS  PubMed  Google Scholar 

  40. 40

    Meyer, S. & von Haeseler, A. Identifying site-specific substitution rates. Mol. Biol. Evol. 20, 182–189 (2003).

    CAS  PubMed  Google Scholar 

  41. 41

    Nielsen, R. & Yang, Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope. Genetics 148, 929–936.

  42. 42

    Yang, Z., Wong, W. S. W. & Nielsen, R. Bayes Empirical Bayes inference of amino acid sites under positive selection. Mol. Biol. Evol. 22, 1107–1118 (2005).

    CAS  PubMed  Google Scholar 

  43. 43

    Murrell, B. et al. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 8, e1002764 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. 44

    Kosakovsky Pond, S. L., Frost, S. D. W. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).

    Google Scholar 

  45. 45

    Delport, W., Poon, A. F. Y., Frost, S. D. W. & Kosakovsky Pond, S. L. Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics 26, 2455–2457 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  46. 46

    Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).

    CAS  PubMed  Google Scholar 

  47. 47

    Goldman, N. & Yang, Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994).

    CAS  PubMed  Google Scholar 

  48. 48

    Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).

    PubMed  PubMed Central  Google Scholar 

  49. 49

    Murrell, B. et al. FUBAR: a fast, unconstrained Bayesian AppRoximation for inferring selection. Mol. Biol. Evol. 30, 1196–1205 (2013). This paper introduces an extremely rapid but accurate method to infer dN/dS.

    CAS  PubMed  PubMed Central  Google Scholar 

  50. 50

    Angelis, K., dos Reis, M. & Yang, Z. Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons. Mol. Biol. Evol. 31, 1902–1913 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. 51

    Pupko, T., Bell, R. E., Mayrose, I., Glaser, F. & Ben-Tal, N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18, S71–S77 (2002). This paper introduced the Rate4Site method that is now widely used to calculate site-specific rates from amino acid sequence data.

    PubMed  Google Scholar 

  52. 52

    Mayrose, I., Graur, D., Ben-Tal, N. & Pupko, T. Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol. Biol. Evol. 21, 1781–1791 (2004).

    CAS  PubMed  Google Scholar 

  53. 53

    Fernandes, A. D. & Atchley, W. R. Site-specific evolutionary rates in proteins are better modeled as non-independent and strictly relative. Bioinformatics 24, 2177–2183 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. 54

    Huang, Y. F. & Golding, G. B. Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures. PLoS Comput. Biol. 10, e1003429–e1003412 (2014).

    PubMed  PubMed Central  Google Scholar 

  55. 55

    Huang, Y.-F. & Golding, G. B. FuncPatch: a web server for the fast bayesian inference of conserved functional patches in protein 3D structures. Bioinformatics 31, 523–531 (2015).

    PubMed  Google Scholar 

  56. 56

    Yang, J.-R., Liao, B.-Y., Zhuang, S.-M. & Zhang, J. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc. Natl Acad. Sci. USA 109, E831–E840 (2012).

    CAS  PubMed  Google Scholar 

  57. 57

    Tien, M. Z., Meyer, A. G., Sydykova, D. K., Spielman, S. J. & Wilke, C. O. Maximum allowed solvent accessibilites of residues in proteins. PLoS ONE 8, e80635 (2013). This paper provides accurate normalization constants required for the calculation of relative solvent accessibility.

    PubMed  PubMed Central  Google Scholar 

  58. 58

    Hubbard, T. J. & Blundell, T. L. Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. Protein Eng. 1, 159–171 (1987).

    CAS  PubMed  Google Scholar 

  59. 59

    Lim, W. A. & Sauer, R. T. Alternative packing arrangements in the hydrophobic core of λrepressor. Nature 339, 31–36 (1989).

    CAS  PubMed  Google Scholar 

  60. 60

    Overington, J., Johnson, M. S., Sali, A. & Blundell, T. L. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc. Biol. Sci. 241, 132–145 (1990).

    CAS  PubMed  Google Scholar 

  61. 61

    Topham, C. M. et al. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J. Mol. Biol. 229, 194–220 (1993).

    CAS  PubMed  Google Scholar 

  62. 62

    Wako, H. & Blundell, T. L. Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. I. solvent accessibility classes. J. Mol. Biol. 238, 682–692 (1994).

    CAS  PubMed  Google Scholar 

  63. 63

    Koshi, J. M. & Goldstein, R. A. Context-dependent optimal substitution matrices. Protein Eng. 8, 641–645 (1995).

    CAS  PubMed  Google Scholar 

  64. 64

    Goldman, N., Thorne, J. L. & Jones, D. T. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149, 445–458 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. 65

    Conant, G. C. & Stadler, P. F. Solvent exposure imparts similar selective pressures across a range of yeast proteins. Mol. Biol. Evol. 26, 1155–1161 (2009).

    CAS  PubMed  Google Scholar 

  66. 66

    Ramsey, D. C., Scherrer, M. P., Zhou, T. & Wilke, C. O. The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 188, 479–488 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  67. 67

    Scherrer, M. P., Meyer, A. G. & Wilke, C. O. Modeling coding-sequence evolution within the context of residue solvent accessibility. BMC Evol. Biol. 12, 179 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  68. 68

    Franzosa, E. A. & Xia, Y. Independent effects of protein core size and expression on residue-level structure-evolution relationships. PLoS ONE 7, e46602 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  69. 69

    Lin, C.-P. et al. Deriving protein dynamical properties from weighted protein contact number. Proteins 72, 929–935 (2008).

    CAS  PubMed  Google Scholar 

  70. 70

    England, J. L. & Shakhnovich, E. Structural determinant of protein designability. Phys. Rev. Lett. 90, 218101 (2003).

    PubMed  Google Scholar 

  71. 71

    Bloom, J. D., Drummond, D. A., Arnold, F. H. & Wilke, C. O. Structural determinants of the rate of protein evolution in yeast. Mol. Biol. Evol. 23, 1751–1761 (2006).

    CAS  PubMed  Google Scholar 

  72. 72

    Shakhnovich, B., Deeds, E., Delisi, C. & Shakhnovich, E. Protein structure and evolutionary history determine sequence space topology. Genome Res. 15, 385–392 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. 73

    Zhou, T., Drummond, D. A. & Wilke, C. O. Contact density affects protein evolutionary rate from bacteria to animals. J. Mol. Evol. 66, 395–404 (2008).

    CAS  PubMed  Google Scholar 

  74. 74

    Yeh, S.-W. et al. Local packing density is the main structural determinant of the rate of protein sequence evolution at site level. BioMed Res. Int. 2014, 572409 (2014).

    PubMed  PubMed Central  Google Scholar 

  75. 75

    Marcos, M. L. & Echave, J. Too packed to change: side-chain packing and site-specific substitution rates in protein evolution. PeerJ 3, e911 (2015).

    PubMed  PubMed Central  Google Scholar 

  76. 76

    Mugal, C. F., Wolf, J. B. W. & Kaj, I. Why time matters: codon evolution and the temporal dynamics of dN/dS. Mol. Biol. Evol. 31, 212–231 (2014).

    CAS  PubMed  Google Scholar 

  77. 77

    Liu, Y. & Bahar, I. Sequence evolution correlates with structural dynamics. Mol. Biol. Evol. 29, 2253–2263 (2012). Study of the correlation between flexibility and site-specific sequence entropy.

    CAS  PubMed  PubMed Central  Google Scholar 

  78. 78

    Halle, B. Flexibility and packing in proteins. Proc. Natl Acad. Sci. USA 99, 1274–1279 (2002).

    CAS  PubMed  Google Scholar 

  79. 79

    Liao, H., Yeh, W., Chiang, D., Jernigan, R. L. & Lustig, B. Protein sequence entropy is closely related to packing density and hydrophobicity. Protein Eng. Des. Sel. 18, 59–64 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  80. 80

    Worth, C. L., Gong, S. & Blundell, T. L. Structural and functional constraints in the evolution of protein families. Nat. Rev. Mol. Cell Biol. 10, 709–720 (2009).

    CAS  PubMed  Google Scholar 

  81. 81

    Bustamante, C. D., Townsend, J. P. & Hartl, D. L. Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol. Biol. Evol. 17, 301–308 (2000).

    CAS  PubMed  Google Scholar 

  82. 82

    Brown, C. J. et al. Evolutionary rate heterogeneity in proteins with long disordered regions. J. Mol. Evol. 55, 104–110 (2002).

    CAS  PubMed  Google Scholar 

  83. 83

    Brown, C. J., Johnson, A. K. & Daughdrill, G. W. Comparing models of evolution for ordered and disordered proteins. Mol. Biol. Evol. 27, 609–621 (2010).

    CAS  PubMed  Google Scholar 

  84. 84

    Tóth-Petróczy, A. & Tawfik, D. S. Slow protein evolutionary rates are dictated by surface-core association. Proc. Natl Acad. Sci. USA 108, 11151–11156 (2011). Systematic study of the distributions of site-specific rates for yeast proteins.

    PubMed  Google Scholar 

  85. 85

    Finkelstein, A. V., Ivankov, D. N., Garbuzynskiy, S. O. & Galzitskaya, O. V. Understanding the folding rates and folding nuclei of globular proteins. Curr. Protein Pept. Sci. 8, 521–536 (2007).

    CAS  PubMed  Google Scholar 

  86. 86

    Ptitsyn, O. B. Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes? J. Mol. Biol. 278, 655–666 (1998).

    CAS  PubMed  Google Scholar 

  87. 87

    Mirny, L. & Shakhnovich, E. Evolutionary conservation of the folding nucleus. J. Mol. Biol. 308, 123–129 (2001).

    CAS  PubMed  Google Scholar 

  88. 88

    Larson, S. M., Ruczinski, I., Davidson, A. R., Baker, D. & Plaxco, K. W. Residues participating in the protein folding nucleus do not exhibit preferential evolutionary conservation. J. Mol. Biol. 316, 225–233 (2002). Study that shows that sites involved in the folding nucleus are not particularly conserved.

    CAS  PubMed  Google Scholar 

  89. 89

    Tseng, Y. Y. & Liang, J. Are residues in a protein folding nucleus evolutionarily conserved? J. Mol. Biol. 335, 869–880 (2004).

    CAS  PubMed  Google Scholar 

  90. 90

    Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proc. Natl Acad. Sci. USA 102, 14338–14343 (2005).

    CAS  PubMed  Google Scholar 

  91. 91

    Franzosa, E. A., Xue, R. & Xia, Y. Quantitative residue-level structure–evolution relationships in the yeast membrane proteome. Genome Biol. Evol. 5, 734–744 (2013).

    PubMed  PubMed Central  Google Scholar 

  92. 92

    Spielman, S. J. & Wilke, C. O. Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors. J. Mol. Evol. 76, 172–182 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  93. 93

    Bartlett, G. J., Porter, C. T., Borkakoti, N. & Thornton, J. M. Analysis of catalytic residues in enzyme active sites. J. Mol. Biol. 324, 105–121 (2002).

    CAS  PubMed  Google Scholar 

  94. 94

    Chelliah, V., Chen, L., Blundell, T. L. & Lovell, S. C. Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J. Mol. Biol. 342, 1487–1504 (2004).

    CAS  PubMed  Google Scholar 

  95. 95

    McLaughlin R. N. Jr, Poelwijk, F. J., Raman, A., Gosal, W. S. & Ranganathan, R. The spatial architecture of protein function and adaptation. Nature 490, 138–142 (2012).

    Google Scholar 

  96. 96

    Mintseris, J. & Weng, Z. Structure, function, and evolution of transient and obligate protein–protein interactions. Proc. Natl Acad. Sci. USA 102, 10930–10935 (2005). This paper shows that sites that participate in obligate protein–protein interactions are more conserved than those involved in transient interactions.

    CAS  PubMed  Google Scholar 

  97. 97

    Kim, P. M., Lu, L. J., Xia, Y. & Gerstein, M. B. Relating three-dimensional structures to protein networks provides evolutionary insights. Science 314, 1938–1941 (2006).

    CAS  PubMed  Google Scholar 

  98. 98

    Huang, Y.-W., Chang, C.-M., Lee, C.-W. & Hwang, J.-K. The conservation profile of a protein bears the imprint of the molecule that is evolutionarily coupled to the protein. Proteins 83, 1407–1413 (2015).

    CAS  PubMed  Google Scholar 

  99. 99

    Kachroo, A. H. et al. Systematic humanization of yeast genes reveals conserved functions and genetic modularity. Science 348, 921–925 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  100. 100

    Glaser, F., Morris, R. J., Najmanovich, R. J., Laskowski, R. A. & Thornton, J. M. A method for localizing ligand binding pockets in protein structures. Proteins Struct. Funct. Genet. 62, 479–488 (2006).

    CAS  PubMed  Google Scholar 

  101. 101

    Capra, J. A., Laskowski, R. A., Thornton, J. M., Singh, M. & Funkhouser, T. A. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput. Biol. 5, e1000585 (2009).

    PubMed  PubMed Central  Google Scholar 

  102. 102

    Yang, J. S., Seo, S. W., Jang, S., Jung, G. Y. & Kim, S. Rational engineering of enzyme allosteric regulation through sequence evolution analysis. PLoS Comput. Biol. 8, e1002612–e1002610 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  103. 103

    Hill, R. E. & Hastie, N. D. Accelerated evolution in the reactive centre regions of serine protease inhibitors. Nature 326, 96–99 (1987).

    CAS  PubMed  Google Scholar 

  104. 104

    Hughes, A. L. & Nei, M. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335, 167–170 (1988).

    CAS  PubMed  Google Scholar 

  105. 105

    Bush, R. M., Fitch, W. M., Bender, C. A. & Cox, N. J. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol. Biol. Evol. 16, 1457–1465 (1999).

    CAS  PubMed  Google Scholar 

  106. 106

    Shih, A. C., Hsiao, T., Ho, M. & Li, W. Simultaneous amino acid substitutions at antigenic sites drive influenza a hemagglutinin evolution. Proc. Natl Acad. Sci. USA 104, 6283–6288 (2007).

    CAS  PubMed  Google Scholar 

  107. 107

    Pan, K. & Deem, M. W. Quantifying selection and diversity in viruses by entropy methods, with application to the haemagglutinin of H3N2 influenza. J. R. Soc. Interface 8, 1644–1653 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  108. 108

    Tusche, C., Steinbrück, L. & McHardy, A. C. Detecting patches of protein sites of influenza A viruses under positive selection. Mol. Biol. Evol. 29, 2063–2071 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  109. 109

    Meyer, A. G. & Wilke, C. O. Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin. PLoS Pathog. 11, e1004940 (2015).

    PubMed  PubMed Central  Google Scholar 

  110. 110

    Liberles, D. A. et al. The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci. 21, 769–785 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  111. 111

    Harms, M. J. & Thornton, J. W. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat. Rev. Genet. 14, 559–571 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  112. 112

    Zhou, H. & Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins 322, 315–322 (2004).

    Google Scholar 

  113. 113

    Shaytan, A. K., Shaitan, K. V. & Khokhlov, A. R. Solvent accessible surface area of amino acid residues in globular proteins: correlation of apparent transfer free energies with experimental hydrophobicity scales. Biomacromolecules 10, 1224–1237 (2009).

    CAS  PubMed  Google Scholar 

  114. 114

    Bloom, J. D. & Glassman, M. J. Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin. PLoS Comput. Biol. 5, e1000349 (2009).

    PubMed  PubMed Central  Google Scholar 

  115. 115

    Wylie, S. C. & Shakhnovich, E. I. A biophysical protein folding model accounts for most mutational fitness effects in viruses. Proc. Natl Acad. Sci. USA 108, 9916–9921 (2011).

    CAS  PubMed  Google Scholar 

  116. 116

    Wylie, S. C. & Shakhnovich, E. I. Mutation induced extinction in finite populations: lethal mutagenesis and lethal isolation. PLoS Comput. Biol. 8, e1002609 (2012).

    CAS  PubMed  PubMed Central  Google Scholar 

  117. 117

    Guerois, R., Nielsen, J. E. & Serrano, L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320, 369–387 (2002).

    CAS  PubMed  Google Scholar 

  118. 118

    Yang, L., Song, G. & Jernigan, R. L. Protein elastic network models and the ranges of cooperativity. Proc. Natl Acad. Sci. USA 106, 12347–12352 (2009).

    CAS  PubMed  Google Scholar 

  119. 119

    Spielman, S. J. & Wilke, C. O. The relationship between dN/dS and scaled selection coefficients. Mol. Biol. Evol. 32, 1097–1108 (2015). This paper establishes a mathematical relationship between mutation–selection models and dN/dS ratios.

    CAS  PubMed  PubMed Central  Google Scholar 

  120. 120

    Kolaczkowski, B. & Thornton, J. W. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980–984 (2004).

    CAS  PubMed  Google Scholar 

  121. 121

    Kleinman, C. L., Rodrigue, N., Lartillot, N. & Philippe, H. Statistical potentials for improved structurally constrained evolutionary models. Mol. Biol. Evol. 27, 1546–1560 (2010).

    CAS  PubMed  Google Scholar 

  122. 122

    Pagel, M. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc. R. Soc. B Biol. Sci. 255, 37–45 (1994).

    Google Scholar 

  123. 123

    Muse, S. V. Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139, 1429–1439 (1995).

    CAS  PubMed  PubMed Central  Google Scholar 

  124. 124

    Poon, A. F. Y., Lewis, F. I., Kosakovsky Pond, S. L. & Frost, S. D. W. An evolutionary-network model reveals stratified interactions in the V3 loop of the HIV-1 envelope. PLoS Comput. Biol. 3, 2279–2290 (2007).

    CAS  Google Scholar 

  125. 125

    Carlson, J. M. et al. Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag. PLoS Comput. Biol. 4, e1000225 (2008).

    PubMed  PubMed Central  Google Scholar 

  126. 126

    Kryazhimskiy, S., Dushoff, J., Bazykin, G. A. & Plotkin, J. B. Prevalence of epistasis in the evolution of influenza A surface proteins. PLoS Genet. 7, e1001301 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  127. 127

    Burger, L. & van Nimwegen, E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 6, e1000633 (2010).

    PubMed  PubMed Central  Google Scholar 

  128. 128

    Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).

    CAS  PubMed  Google Scholar 

  129. 129

    Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).

    CAS  PubMed  Google Scholar 

  130. 130

    Skerker, J. M. et al. Rewiring the specificity of two-component signal transduction systems. Cell 133, 1043–1054 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  131. 131

    Cheng, R. R., Morcos, F., Levine, H. & Onuchic, J. N. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc. Natl Acad. Sci. USA 111, E563–E571 (2014).

    CAS  PubMed  Google Scholar 

  132. 132

    Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).

    CAS  PubMed  PubMed Central  Google Scholar 

  133. 133

    Ollikainen, N. & Kortemme, T. Computational protein design quantifies structural constraints on amino acid covariation. PLoS Comput. Biol. 9, e1003313 (2013).

    PubMed  PubMed Central  Google Scholar 

  134. 134

    Jackson, E. L., Ollikainen, N., Covert, A. W., Kortemme, T. & Wilke, C. O. Amino-acid site variability among natural and designed proteins. PeerJ 1, e211 (2013).

    PubMed  PubMed Central  Google Scholar 

  135. 135

    Tokuriki, N., Oldfield, C. J., Uversky, V. N., Berezovsky, I. N. & Tawfik, D. S. Do viral proteins possess unique biophysical features? Trends Biochem. Sci. 34, 53–59 (2009).

    CAS  PubMed  Google Scholar 

  136. 136

    Faure, G. & Koonin, E. V. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys. Biol. 12, 035001 (2015).

    PubMed  PubMed Central  Google Scholar 

  137. 137

    Lopez, P., Casane, D. & Philippe, H. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002).

    CAS  PubMed  Google Scholar 

  138. 138

    Gu, X. Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16, 1664–1674 (1999).

    CAS  PubMed  Google Scholar 

  139. 139

    Gu, X. A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences. Mol. Biol. Evol. 23, 1937–1945 (2006).

    CAS  PubMed  Google Scholar 

  140. 140

    Pollock, D. D., Thiltgen, G. & Goldstein, R. A. Amino acid coevolution induces an evolutionary stokes shift. Proc. Natl Acad. Sci. USA 109, E1352–E1359 (2012). This paper introduces the concept of evolutionary Stokes shift: when an amino acid substitution occurs at a site, its neighbours evolve more rapidly to accommodate the substitution.

    CAS  PubMed  Google Scholar 

  141. 141

    Leferink, N. G. H. et al. Impact of residues remote from the catalytic centre on enzyme catalysis of copper nitrite reductase. Nat. Commun. 5, 4395 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  142. 142

    Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nat. Methods 7, 741–746 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  143. 143

    Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  144. 144

    Romero, P. A., Tran, T. M. & Abate, A. R. Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc. Natl Acad. Sci. USA 112, 7159–7164 (2015).

    CAS  PubMed  Google Scholar 

  145. 145

    Bloom, J. D. An experimentally informed evolutionary model improves phylogenetic fit to divergent lactamase homolog. Mol. Biol. Evol. 31, 2753–2769 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  146. 146

    Abriata, L. A., Palzkill, T. & Dal Peraro, M. How structural and physicochemical determinants shape sequence constraints in a functional enzyme. PLoS ONE 10, e0118684 (2015). This paper shows one example (TEM lactamase) for which functional constraints relax slowly with distance to the active site.

    PubMed  PubMed Central  Google Scholar 

  147. 147

    Bloom, J. D. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Mol. Biol. Evol. 31, 1956–1978 (2014). One of the first studies to parameterize a phylogenetic model with experimentally measured, site-specific parameters.

    CAS  PubMed  PubMed Central  Google Scholar 

  148. 148

    Doud, M. B., Ashenberg, O. & Bloom, J. D. Site-specific amino acid preferences are mostly conserved in two closely related protein homologs. Mol. Biol. Evol. 32, 2944–2960 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

Download references


J.E. is Principal Investigator of CONICET. This work was also supported in part by US National Institutes of Health (NIH) grant F31 GM113622-01 to S.J.S. and by NIH grant R01 GM088344, NSF Cooperative agreement DBI-0939454 (BEACON Center), and ARO grant W911NF-12-1-0390 to C.O.W.

Author information



Corresponding authors

Correspondence to Julian Echave or Claus O. Wilke.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

PowerPoint slides


Evolutionary rates

Number of substitutions (fixed mutations) per unit of evolutionary time.

Structural constraints

Structural features that correlate with sequence conservation (for example, solvent accessibility).

Functional constraints

Functional features that correlate with sequence conservation (for example, involvement in the active site).


Mutations that have spread to all members of the population (that is, have fixed), substituting the ancestral variant.


Ratio of non-synonymous to synonymous evolutionary rates.


Non-synonymous evolutionary rate: that is, the rate at which non-synonymous substitutions (fixed mutations) occur per unit of evolutionary time.

Non-synonymous substitutions

DNA substitutions that change from a codon that codes for one amino acid to a codon that codes for a different amino acid.


Synonymous evolutionary rate: that is, the rate at which synonymous substitutions (fixed mutations) occur per unit of evolutionary time.

Synonymous substitutions

DNA substitutions that change from a codon that codes for one amino acid to a codon that codes for the same amino acid.

Purifying selection

Loss of mutations that decrease fitness (deleterious).

Positive selection

Fixation of mutations that increase fitness (adaptive).


Popular software to estimate relative site-specific rates from amino acid sequence data.

Accessible surface area

(ASA). Same as solvent accessible surface area.

Solvent accessible surface area

(SASA). Surface area of a given residue that is accessible to water.

Relative solvent accessibility

(RSA). Measures the proportion of the surface of an amino acid that is accessible to solvent (that is, water) in the folded protein structure, from 0 (completely inaccessible) to 1 (completely accessible). Calculated as the ratio of the solvent accessible surface area (SASA) of a given residue in the protein structure and the maximum SASA of that residue in a fully solvent-accessible conformation.

Contact number

(CN). Number of neighbouring residues present in a protein structure within a given distance (for example, 10 Å) from a focal residue.

Weighted contact number

(WCN). Similar to the contact number, but the neighbouring residues are weighted by their inverse square distance to the focal residue, and all residues in a structure are considered to be neighbouring residues.

Mean square fluctuations

(MSFs). Time-average of the square norm of the vector that connects the instantaneous coordinates of a site to its equilibrium coordinates; measures the amount of movement a residue undergoes over time.


(Also known as temperature factors). Quantity that measures the amount of thermal motion of an atom in a protein crystal structure.


Mutational change of stability; the folding free energy difference between mutant and wild type when each is in its own native conformation.


Mutational change of stability of the active conformation; free energy difference between the active conformation of the mutant and the active conformation of the wild type.


Mutational change of the activation free energy; difference between mutant and wild type of the free energy needed to deform the protein from the native into the active conformation.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Echave, J., Spielman, S. & Wilke, C. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 17, 109–121 (2016).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing