The rate of evolution varies among sites within proteins owing to structural and functional constraints.
The main pattern of variation is due to structural constraints: evolutionary rates increase from the slowly evolving, solvent-inaccessible, tightly packed and rigid protein interior, to the rapidly evolving, solvent-exposed and loosely packed protein surface.
Functional constraints result in the slow evolution of sites that are directly involved in protein function and their neighbours. There may also be longer range effects on distant sites.
According to mechanistic biophysical models, site-specific evolutionary rates are related to mutational changes of thermodynamic stability. Structural predictors, such as solvent accessibility and local packing, would be proxies of mutational stability changes.
Our understanding of rate variation among sites remains limited: at best, current models explain approximately 60% of the observed variance in site-specific rates, and in many cases these models explain considerably less.
To make further progress, we need to develop better rate inference methods, complete the list of structural and functional molecular features that correlate with rates, and undertake further research on theoretical models derived from first principles.
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Protein Data Bank
Zhang, J. & Yang, J.-R. Determinants of the rate of protein sequence evolution. Nat. Rev. Genet. 16, 409–420 (2015).
Yang, Z. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).
Yang, Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol. Evol. 11, 367–372 (1996).
Lartillot, N. & Phillipe, H. A. Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004).
Yang, Z. Computational Molecular Evolution (Oxford Univ. Press, 2006).
Holder, M. T., Zwickl, D. J. & Dessimoz, C. Evaluating the robustness of phylogenetic methods to among-site variability in substitution processes. Phil. Trans. R. Soc. B 363, 4013–4021 (2008).
Wang, H. C., Li, K., Susko, E. & Roger, A. J. A class frequency mixture model that adjusts for site-specific amino acid frequencies and improves inference of protein phylogeny. BMC Evol. Biol. 8, 331 (2008).
Le, S. Q., Dang, C. C. & Gascuel, O. Modeling protein evolution with several amino acid replacement matrices depending on site rates. Mol. Biol. Evol. 29, 2921–2936 (2012).
Yang, Z. H., Nielsen, R., Goldman, N. & Pedersen, A. M. K. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449 (2000).
Buckley, T. R., Simon, C. & Chambers, G. K. Exploring among-site rate variation models in a maximum likelihood framework using empirical data: effects of model assumptions on estimates of topology, branch lengths, and bootstrap support. Syst. Biol. 50, 67–86 (2001).
Mayrose, I., Friedman, N. & Pupko, T. A gamma mixture model better accounts for among site rate heterogeneity. Bioinformatics 21, ii151–ii158 (2005).
Delport, W., Scheffler, K., Gravenor, M. B., Muse, S. V. & Kosakovsky Pond, S. L. Benchmarking multi-rate codon models. PLoS ONE 5, e11587 (2010).
Lartillot, N. Probabilistic models of eukaryotic evolution: time for integration. Phil. Trans. R. Soc. B 370, 20140338 (2015).
Liberles, D. A., Teufel, A. I., Liu, L. & Stadler, T. On the need for mechanistic models in computational genomics and metagenomics. Genome Biol. Evol. 5, 2008–2018 (2013).
Perutz, M. F., Kendrew, J. C. & Watson, H. C. Structure and function of haemoglobin: II. Some relations between polypeptide chain configuration and amino acid sequence. J. Mol. Biol. 13, 669–678 (1965).
Kimura, M. & Ota, T. On some principles governing molecular evolution. Proc. Natl Acad. Sci. USA 71, 2848–2852 (1974).
Dean, A. M., Neuhauser, C., Grenier, E. & Golding, G. B. The pattern of amino acid replacements in α/β-barrels. Mol. Biol. Evol. 19, 1846–1864 (2002). One of the first studies to consider both structural and functional determinants of site-specific amino acid substitution rates.
Franzosa, E. A. & Xia, Y. Structural determinants of protein evolution are context-sensitive at the residue level. Mol. Biol. Evol. 26, 2387–2395 (2009). This landmark study found that that site-specific rate ( dN/dS ) increases linearly with solvent accessibility in yeast.
Shih, C.-H. & Hwang, J.-K. Evolutionary information hidden in a single protein structure. Proteins 80, 1647–1657 (2012).
Nevin Gerek, Z., Kumar, S. & Banu Ozkan, S. Structural dynamics flexibility informs function and evolution at a proteome scale. Evol. Appl. 6, 423–433 (2013).
Marsh, J. A. & Teichmann, S. A. Parallel dynamics and evolution: protein conformational fluctuations and assembly reflect evolutionary changes in sequence and structure. BioEssays 36, 209–218 (2014).
Shahmoradi, A. et al. Predicting evolutionary site variability from structure in viral proteins: buriedness, packing, flexibility, and design. J. Mol. Evol. 79, 130–142 (2014).
Yeh, S.-W. et al. Site-specific structural constraints on protein sequence evolutionary divergence: local packing density versus solvent exposure. Mol. Biol. Evol. 31, 135–139 (2014). First study showing that site-specific rates correlate more strongly with WCN than with RSA.
Huang, T.-T., Del Valle Marcos, M. L., Hwang, J.-K. & Echave, J. A mechanistic stress model of protein evolution accounts for site-specific evolutionary rates and their relationship with packing density and flexibility. BMC Evol. Biol. 14, 78 (2014). This paper introduces the stress model of protein evolution, a biophysical model based on mutational changes of active-state stability.
Echave, J., Jackson, E. L. & Wilke, C. O. Relationship between protein thermodynamic constraints and variation of evolutionary rates among sites. Phys. Biol. 12, 025002 (2015). Study of rate variation among sites using the native-stability threshold biophysical model.
Meyer, A. G., Spielman, S. J., Bedford, T. & Wilke, C. O. Time dependence of evolutionary metrics during the 2009 pandemic influenza virus outbreak. Virus Evol. 1, vev006–vev010 (2015).
Nielsen, R. Mapping mutations on phylogenies. Syst. Biol. 51, 729–739 (2002).
Kosakovsky Pond, S. L. & Frost, S. D. W. A simple hierarchical approach to modeling distributions of substitution rates. Mol. Biol. Evol. 22, 223–234 (2004).
Kosakovsky Pond, S. L. & Frost, S. D. W. Not so different after all: a comparison of methods for detecting amino acid sites under selection. Mol. Biol. Evol. 22, 1208–1222 (2005). Landmark paper benchmarking different methods of site-specific rate inference.
Lemey, P., Minin, V. N., Bielejec, F., Kosakovsky Pond, S. L. & Suchard, M. A. A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection. Bioinformatics 28, 3248–3256 (2012).
Rodrigue, N. On the statistical interpretation of site-specific variables in phylogeny-based substitution models. Genetics 193, 557–564 (2013).
Valdar, W. S. Scoring residue conservation. Proteins 48, 227–241 (2002).
Capra, J. A. & Singh, M. Predicting functionally important residues from sequence conservation. Bioinformatics 23, 1875–1882 (2007).
Johansson, F. & Toh, H. A comparative study of conservation and variation scores. BMC Bioinformatics 11, 311–388 (2010).
Muse, S. V. Estimating synonymous and nonsynonymous substitution rates. Mol. Biol. Evol. 13, 105–114 (1996).
Meyer, A. G. & Wilke, C. O. Integrating sequence variation and protein structure to identify sites under selection. Mol. Biol. Evol. 30, 36–44 (2013).
Li, W.-H., Wu, C.-I. & Luo, C.-C. A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution consider the relative likelihood of nucleotide and codon changes. Mol. Biol. Evol. 2, 150–174 (1985).
Nei, M. & Gojobori, T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 3, 418–426 (1986).
Yang, Z. & Nielsen, R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 17, 32–42 (2000).
Meyer, S. & von Haeseler, A. Identifying site-specific substitution rates. Mol. Biol. Evol. 20, 182–189 (2003).
Nielsen, R. & Yang, Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope. Genetics 148, 929–936.
Yang, Z., Wong, W. S. W. & Nielsen, R. Bayes Empirical Bayes inference of amino acid sites under positive selection. Mol. Biol. Evol. 22, 1107–1118 (2005).
Murrell, B. et al. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 8, e1002764 (2012).
Kosakovsky Pond, S. L., Frost, S. D. W. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679 (2005).
Delport, W., Poon, A. F. Y., Frost, S. D. W. & Kosakovsky Pond, S. L. Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology. Bioinformatics 26, 2455–2457 (2010).
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Goldman, N. & Yang, Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 11, 725–736 (1994).
Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007).
Murrell, B. et al. FUBAR: a fast, unconstrained Bayesian AppRoximation for inferring selection. Mol. Biol. Evol. 30, 1196–1205 (2013). This paper introduces an extremely rapid but accurate method to infer dN/dS.
Angelis, K., dos Reis, M. & Yang, Z. Bayesian estimation of nonsynonymous/synonymous rate ratios for pairwise sequence comparisons. Mol. Biol. Evol. 31, 1902–1913 (2014).
Pupko, T., Bell, R. E., Mayrose, I., Glaser, F. & Ben-Tal, N. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18, S71–S77 (2002). This paper introduced the Rate4Site method that is now widely used to calculate site-specific rates from amino acid sequence data.
Mayrose, I., Graur, D., Ben-Tal, N. & Pupko, T. Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol. Biol. Evol. 21, 1781–1791 (2004).
Fernandes, A. D. & Atchley, W. R. Site-specific evolutionary rates in proteins are better modeled as non-independent and strictly relative. Bioinformatics 24, 2177–2183 (2008).
Huang, Y. F. & Golding, G. B. Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures. PLoS Comput. Biol. 10, e1003429–e1003412 (2014).
Huang, Y.-F. & Golding, G. B. FuncPatch: a web server for the fast bayesian inference of conserved functional patches in protein 3D structures. Bioinformatics 31, 523–531 (2015).
Yang, J.-R., Liao, B.-Y., Zhuang, S.-M. & Zhang, J. Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc. Natl Acad. Sci. USA 109, E831–E840 (2012).
Tien, M. Z., Meyer, A. G., Sydykova, D. K., Spielman, S. J. & Wilke, C. O. Maximum allowed solvent accessibilites of residues in proteins. PLoS ONE 8, e80635 (2013). This paper provides accurate normalization constants required for the calculation of relative solvent accessibility.
Hubbard, T. J. & Blundell, T. L. Comparison of solvent-inaccessible cores of homologous proteins: definitions useful for protein modelling. Protein Eng. 1, 159–171 (1987).
Lim, W. A. & Sauer, R. T. Alternative packing arrangements in the hydrophobic core of λrepressor. Nature 339, 31–36 (1989).
Overington, J., Johnson, M. S., Sali, A. & Blundell, T. L. Tertiary structural constraints on protein evolutionary diversity: templates, key residues and structure prediction. Proc. Biol. Sci. 241, 132–145 (1990).
Topham, C. M. et al. Fragment ranking in modelling of protein structure. Conformationally constrained environmental amino acid substitution tables. J. Mol. Biol. 229, 194–220 (1993).
Wako, H. & Blundell, T. L. Use of amino acid environment-dependent substitution tables and conformational propensities in structure prediction from aligned sequences of homologous proteins. I. solvent accessibility classes. J. Mol. Biol. 238, 682–692 (1994).
Koshi, J. M. & Goldstein, R. A. Context-dependent optimal substitution matrices. Protein Eng. 8, 641–645 (1995).
Goldman, N., Thorne, J. L. & Jones, D. T. Assessing the impact of secondary structure and solvent accessibility on protein evolution. Genetics 149, 445–458 (1998).
Conant, G. C. & Stadler, P. F. Solvent exposure imparts similar selective pressures across a range of yeast proteins. Mol. Biol. Evol. 26, 1155–1161 (2009).
Ramsey, D. C., Scherrer, M. P., Zhou, T. & Wilke, C. O. The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 188, 479–488 (2011).
Scherrer, M. P., Meyer, A. G. & Wilke, C. O. Modeling coding-sequence evolution within the context of residue solvent accessibility. BMC Evol. Biol. 12, 179 (2012).
Franzosa, E. A. & Xia, Y. Independent effects of protein core size and expression on residue-level structure-evolution relationships. PLoS ONE 7, e46602 (2012).
Lin, C.-P. et al. Deriving protein dynamical properties from weighted protein contact number. Proteins 72, 929–935 (2008).
England, J. L. & Shakhnovich, E. Structural determinant of protein designability. Phys. Rev. Lett. 90, 218101 (2003).
Bloom, J. D., Drummond, D. A., Arnold, F. H. & Wilke, C. O. Structural determinants of the rate of protein evolution in yeast. Mol. Biol. Evol. 23, 1751–1761 (2006).
Shakhnovich, B., Deeds, E., Delisi, C. & Shakhnovich, E. Protein structure and evolutionary history determine sequence space topology. Genome Res. 15, 385–392 (2005).
Zhou, T., Drummond, D. A. & Wilke, C. O. Contact density affects protein evolutionary rate from bacteria to animals. J. Mol. Evol. 66, 395–404 (2008).
Yeh, S.-W. et al. Local packing density is the main structural determinant of the rate of protein sequence evolution at site level. BioMed Res. Int. 2014, 572409 (2014).
Marcos, M. L. & Echave, J. Too packed to change: side-chain packing and site-specific substitution rates in protein evolution. PeerJ 3, e911 (2015).
Mugal, C. F., Wolf, J. B. W. & Kaj, I. Why time matters: codon evolution and the temporal dynamics of dN/dS. Mol. Biol. Evol. 31, 212–231 (2014).
Liu, Y. & Bahar, I. Sequence evolution correlates with structural dynamics. Mol. Biol. Evol. 29, 2253–2263 (2012). Study of the correlation between flexibility and site-specific sequence entropy.
Halle, B. Flexibility and packing in proteins. Proc. Natl Acad. Sci. USA 99, 1274–1279 (2002).
Liao, H., Yeh, W., Chiang, D., Jernigan, R. L. & Lustig, B. Protein sequence entropy is closely related to packing density and hydrophobicity. Protein Eng. Des. Sel. 18, 59–64 (2005).
Worth, C. L., Gong, S. & Blundell, T. L. Structural and functional constraints in the evolution of protein families. Nat. Rev. Mol. Cell Biol. 10, 709–720 (2009).
Bustamante, C. D., Townsend, J. P. & Hartl, D. L. Solvent accessibility and purifying selection within proteins of Escherichia coli and Salmonella enterica. Mol. Biol. Evol. 17, 301–308 (2000).
Brown, C. J. et al. Evolutionary rate heterogeneity in proteins with long disordered regions. J. Mol. Evol. 55, 104–110 (2002).
Brown, C. J., Johnson, A. K. & Daughdrill, G. W. Comparing models of evolution for ordered and disordered proteins. Mol. Biol. Evol. 27, 609–621 (2010).
Tóth-Petróczy, A. & Tawfik, D. S. Slow protein evolutionary rates are dictated by surface-core association. Proc. Natl Acad. Sci. USA 108, 11151–11156 (2011). Systematic study of the distributions of site-specific rates for yeast proteins.
Finkelstein, A. V., Ivankov, D. N., Garbuzynskiy, S. O. & Galzitskaya, O. V. Understanding the folding rates and folding nuclei of globular proteins. Curr. Protein Pept. Sci. 8, 521–536 (2007).
Ptitsyn, O. B. Protein folding and protein evolution: common folding nucleus in different subfamilies of c-type cytochromes? J. Mol. Biol. 278, 655–666 (1998).
Mirny, L. & Shakhnovich, E. Evolutionary conservation of the folding nucleus. J. Mol. Biol. 308, 123–129 (2001).
Larson, S. M., Ruczinski, I., Davidson, A. R., Baker, D. & Plaxco, K. W. Residues participating in the protein folding nucleus do not exhibit preferential evolutionary conservation. J. Mol. Biol. 316, 225–233 (2002). Study that shows that sites involved in the folding nucleus are not particularly conserved.
Tseng, Y. Y. & Liang, J. Are residues in a protein folding nucleus evolutionarily conserved? J. Mol. Biol. 335, 869–880 (2004).
Drummond, D. A., Bloom, J. D., Adami, C., Wilke, C. O. & Arnold, F. H. Why highly expressed proteins evolve slowly. Proc. Natl Acad. Sci. USA 102, 14338–14343 (2005).
Franzosa, E. A., Xue, R. & Xia, Y. Quantitative residue-level structure–evolution relationships in the yeast membrane proteome. Genome Biol. Evol. 5, 734–744 (2013).
Spielman, S. J. & Wilke, C. O. Membrane environment imposes unique selection pressures on transmembrane domains of G protein-coupled receptors. J. Mol. Evol. 76, 172–182 (2013).
Bartlett, G. J., Porter, C. T., Borkakoti, N. & Thornton, J. M. Analysis of catalytic residues in enzyme active sites. J. Mol. Biol. 324, 105–121 (2002).
Chelliah, V., Chen, L., Blundell, T. L. & Lovell, S. C. Distinguishing structural and functional restraints in evolution in order to identify interaction sites. J. Mol. Biol. 342, 1487–1504 (2004).
McLaughlin R. N. Jr, Poelwijk, F. J., Raman, A., Gosal, W. S. & Ranganathan, R. The spatial architecture of protein function and adaptation. Nature 490, 138–142 (2012).
Mintseris, J. & Weng, Z. Structure, function, and evolution of transient and obligate protein–protein interactions. Proc. Natl Acad. Sci. USA 102, 10930–10935 (2005). This paper shows that sites that participate in obligate protein–protein interactions are more conserved than those involved in transient interactions.
Kim, P. M., Lu, L. J., Xia, Y. & Gerstein, M. B. Relating three-dimensional structures to protein networks provides evolutionary insights. Science 314, 1938–1941 (2006).
Huang, Y.-W., Chang, C.-M., Lee, C.-W. & Hwang, J.-K. The conservation profile of a protein bears the imprint of the molecule that is evolutionarily coupled to the protein. Proteins 83, 1407–1413 (2015).
Kachroo, A. H. et al. Systematic humanization of yeast genes reveals conserved functions and genetic modularity. Science 348, 921–925 (2015).
Glaser, F., Morris, R. J., Najmanovich, R. J., Laskowski, R. A. & Thornton, J. M. A method for localizing ligand binding pockets in protein structures. Proteins Struct. Funct. Genet. 62, 479–488 (2006).
Capra, J. A., Laskowski, R. A., Thornton, J. M., Singh, M. & Funkhouser, T. A. Predicting protein ligand binding sites by combining evolutionary sequence conservation and 3D structure. PLoS Comput. Biol. 5, e1000585 (2009).
Yang, J. S., Seo, S. W., Jang, S., Jung, G. Y. & Kim, S. Rational engineering of enzyme allosteric regulation through sequence evolution analysis. PLoS Comput. Biol. 8, e1002612–e1002610 (2012).
Hill, R. E. & Hastie, N. D. Accelerated evolution in the reactive centre regions of serine protease inhibitors. Nature 326, 96–99 (1987).
Hughes, A. L. & Nei, M. Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection. Nature 335, 167–170 (1988).
Bush, R. M., Fitch, W. M., Bender, C. A. & Cox, N. J. Positive selection on the H3 hemagglutinin gene of human influenza virus A. Mol. Biol. Evol. 16, 1457–1465 (1999).
Shih, A. C., Hsiao, T., Ho, M. & Li, W. Simultaneous amino acid substitutions at antigenic sites drive influenza a hemagglutinin evolution. Proc. Natl Acad. Sci. USA 104, 6283–6288 (2007).
Pan, K. & Deem, M. W. Quantifying selection and diversity in viruses by entropy methods, with application to the haemagglutinin of H3N2 influenza. J. R. Soc. Interface 8, 1644–1653 (2011).
Tusche, C., Steinbrück, L. & McHardy, A. C. Detecting patches of protein sites of influenza A viruses under positive selection. Mol. Biol. Evol. 29, 2063–2071 (2012).
Meyer, A. G. & Wilke, C. O. Geometric constraints dominate the antigenic evolution of influenza H3N2 hemagglutinin. PLoS Pathog. 11, e1004940 (2015).
Liberles, D. A. et al. The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci. 21, 769–785 (2012).
Harms, M. J. & Thornton, J. W. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat. Rev. Genet. 14, 559–571 (2013).
Zhou, H. & Zhou, Y. Quantifying the effect of burial of amino acid residues on protein stability. Proteins 322, 315–322 (2004).
Shaytan, A. K., Shaitan, K. V. & Khokhlov, A. R. Solvent accessible surface area of amino acid residues in globular proteins: correlation of apparent transfer free energies with experimental hydrophobicity scales. Biomacromolecules 10, 1224–1237 (2009).
Bloom, J. D. & Glassman, M. J. Inferring stabilizing mutations from protein phylogenies: application to influenza hemagglutinin. PLoS Comput. Biol. 5, e1000349 (2009).
Wylie, S. C. & Shakhnovich, E. I. A biophysical protein folding model accounts for most mutational fitness effects in viruses. Proc. Natl Acad. Sci. USA 108, 9916–9921 (2011).
Wylie, S. C. & Shakhnovich, E. I. Mutation induced extinction in finite populations: lethal mutagenesis and lethal isolation. PLoS Comput. Biol. 8, e1002609 (2012).
Guerois, R., Nielsen, J. E. & Serrano, L. Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320, 369–387 (2002).
Yang, L., Song, G. & Jernigan, R. L. Protein elastic network models and the ranges of cooperativity. Proc. Natl Acad. Sci. USA 106, 12347–12352 (2009).
Spielman, S. J. & Wilke, C. O. The relationship between dN/dS and scaled selection coefficients. Mol. Biol. Evol. 32, 1097–1108 (2015). This paper establishes a mathematical relationship between mutation–selection models and dN/dS ratios.
Kolaczkowski, B. & Thornton, J. W. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980–984 (2004).
Kleinman, C. L., Rodrigue, N., Lartillot, N. & Philippe, H. Statistical potentials for improved structurally constrained evolutionary models. Mol. Biol. Evol. 27, 1546–1560 (2010).
Pagel, M. Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc. R. Soc. B Biol. Sci. 255, 37–45 (1994).
Muse, S. V. Evolutionary analyses of DNA sequences subject to constraints on secondary structure. Genetics 139, 1429–1439 (1995).
Poon, A. F. Y., Lewis, F. I., Kosakovsky Pond, S. L. & Frost, S. D. W. An evolutionary-network model reveals stratified interactions in the V3 loop of the HIV-1 envelope. PLoS Comput. Biol. 3, 2279–2290 (2007).
Carlson, J. M. et al. Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag. PLoS Comput. Biol. 4, e1000225 (2008).
Kryazhimskiy, S., Dushoff, J., Bazykin, G. A. & Plotkin, J. B. Prevalence of epistasis in the evolution of influenza A surface proteins. PLoS Genet. 7, e1001301 (2011).
Burger, L. & van Nimwegen, E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLoS Comput. Biol. 6, e1000633 (2010).
Morcos, F. et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl Acad. Sci. USA 108, E1293–E1301 (2011).
Jones, D. T., Buchan, D. W., Cozzetto, D. & Pontil, M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28, 184–190 (2012).
Skerker, J. M. et al. Rewiring the specificity of two-component signal transduction systems. Cell 133, 1043–1054 (2008).
Cheng, R. R., Morcos, F., Levine, H. & Onuchic, J. N. Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc. Natl Acad. Sci. USA 111, E563–E571 (2014).
Leaver-Fay, A. et al. ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011).
Ollikainen, N. & Kortemme, T. Computational protein design quantifies structural constraints on amino acid covariation. PLoS Comput. Biol. 9, e1003313 (2013).
Jackson, E. L., Ollikainen, N., Covert, A. W., Kortemme, T. & Wilke, C. O. Amino-acid site variability among natural and designed proteins. PeerJ 1, e211 (2013).
Tokuriki, N., Oldfield, C. J., Uversky, V. N., Berezovsky, I. N. & Tawfik, D. S. Do viral proteins possess unique biophysical features? Trends Biochem. Sci. 34, 53–59 (2009).
Faure, G. & Koonin, E. V. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys. Biol. 12, 035001 (2015).
Lopez, P., Casane, D. & Philippe, H. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19, 1–7 (2002).
Gu, X. Statistical methods for testing functional divergence after gene duplication. Mol. Biol. Evol. 16, 1664–1674 (1999).
Gu, X. A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences. Mol. Biol. Evol. 23, 1937–1945 (2006).
Pollock, D. D., Thiltgen, G. & Goldstein, R. A. Amino acid coevolution induces an evolutionary stokes shift. Proc. Natl Acad. Sci. USA 109, E1352–E1359 (2012). This paper introduces the concept of evolutionary Stokes shift: when an amino acid substitution occurs at a site, its neighbours evolve more rapidly to accommodate the substitution.
Leferink, N. G. H. et al. Impact of residues remote from the catalytic centre on enzyme catalysis of copper nitrite reductase. Nat. Commun. 5, 4395 (2014).
Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nat. Methods 7, 741–746 (2010).
Fowler, D. M. & Fields, S. Deep mutational scanning: a new style of protein science. Nat. Methods 11, 801–807 (2014).
Romero, P. A., Tran, T. M. & Abate, A. R. Dissecting enzyme function with microfluidic-based deep mutational scanning. Proc. Natl Acad. Sci. USA 112, 7159–7164 (2015).
Bloom, J. D. An experimentally informed evolutionary model improves phylogenetic fit to divergent lactamase homolog. Mol. Biol. Evol. 31, 2753–2769 (2014).
Abriata, L. A., Palzkill, T. & Dal Peraro, M. How structural and physicochemical determinants shape sequence constraints in a functional enzyme. PLoS ONE 10, e0118684 (2015). This paper shows one example (TEM lactamase) for which functional constraints relax slowly with distance to the active site.
Bloom, J. D. An experimentally determined evolutionary model dramatically improves phylogenetic fit. Mol. Biol. Evol. 31, 1956–1978 (2014). One of the first studies to parameterize a phylogenetic model with experimentally measured, site-specific parameters.
Doud, M. B., Ashenberg, O. & Bloom, J. D. Site-specific amino acid preferences are mostly conserved in two closely related protein homologs. Mol. Biol. Evol. 32, 2944–2960 (2015).
J.E. is Principal Investigator of CONICET. This work was also supported in part by US National Institutes of Health (NIH) grant F31 GM113622-01 to S.J.S. and by NIH grant R01 GM088344, NSF Cooperative agreement DBI-0939454 (BEACON Center), and ARO grant W911NF-12-1-0390 to C.O.W.
The authors declare no competing financial interests.
- Evolutionary rates
Number of substitutions (fixed mutations) per unit of evolutionary time.
- Structural constraints
Structural features that correlate with sequence conservation (for example, solvent accessibility).
- Functional constraints
Functional features that correlate with sequence conservation (for example, involvement in the active site).
Mutations that have spread to all members of the population (that is, have fixed), substituting the ancestral variant.
Ratio of non-synonymous to synonymous evolutionary rates.
Non-synonymous evolutionary rate: that is, the rate at which non-synonymous substitutions (fixed mutations) occur per unit of evolutionary time.
- Non-synonymous substitutions
DNA substitutions that change from a codon that codes for one amino acid to a codon that codes for a different amino acid.
Synonymous evolutionary rate: that is, the rate at which synonymous substitutions (fixed mutations) occur per unit of evolutionary time.
- Synonymous substitutions
DNA substitutions that change from a codon that codes for one amino acid to a codon that codes for the same amino acid.
- Purifying selection
Loss of mutations that decrease fitness (deleterious).
- Positive selection
Fixation of mutations that increase fitness (adaptive).
Popular software to estimate relative site-specific rates from amino acid sequence data.
- Accessible surface area
(ASA). Same as solvent accessible surface area.
- Solvent accessible surface area
(SASA). Surface area of a given residue that is accessible to water.
- Relative solvent accessibility
(RSA). Measures the proportion of the surface of an amino acid that is accessible to solvent (that is, water) in the folded protein structure, from 0 (completely inaccessible) to 1 (completely accessible). Calculated as the ratio of the solvent accessible surface area (SASA) of a given residue in the protein structure and the maximum SASA of that residue in a fully solvent-accessible conformation.
- Contact number
(CN). Number of neighbouring residues present in a protein structure within a given distance (for example, 10 Å) from a focal residue.
- Weighted contact number
(WCN). Similar to the contact number, but the neighbouring residues are weighted by their inverse square distance to the focal residue, and all residues in a structure are considered to be neighbouring residues.
- Mean square fluctuations
(MSFs). Time-average of the square norm of the vector that connects the instantaneous coordinates of a site to its equilibrium coordinates; measures the amount of movement a residue undergoes over time.
(Also known as temperature factors). Quantity that measures the amount of thermal motion of an atom in a protein crystal structure.
Mutational change of stability; the folding free energy difference between mutant and wild type when each is in its own native conformation.
Mutational change of stability of the active conformation; free energy difference between the active conformation of the mutant and the active conformation of the wild type.
Mutational change of the activation free energy; difference between mutant and wild type of the free energy needed to deform the protein from the native into the active conformation.
About this article
Cite this article
Echave, J., Spielman, S. & Wilke, C. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 17, 109–121 (2016). https://doi.org/10.1038/nrg.2015.18
Frontiers in Microbiology (2021)
Intrinsically Disordered Protein Ensembles Shape Evolutionary Rates Revealing Conformational Patterns
Journal of Molecular Biology (2021)
Improving thermostability of (R)-selective amine transaminase from Aspergillus terreus by evolutionary coupling saturation mutagenesis
Biochemical Engineering Journal (2021)
Exploring mutable conserved sites and fatal non-conserved sites by random mutation of esterase from Sulfolobus tokodaii and subtilisin from Thermococcus kodakarensis
International Journal of Biological Macromolecules (2021)
Molecular Biology and Evolution (2020)