Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Sequence entropy of folding and the absolute rate of amino acid substitutions

Abstract

Adequate representations of protein evolution should consider how the acceptance of mutations depends on the sequence context in which they arise. However, epistatic interactions among sites in a protein result in hererogeneities in the substitution rate, both temporal and spatial, that are beyond the capabilities of current models. Here we use parallels between amino acid substitutions and chemical reaction kinetics to develop an improved theory of protein evolution. We constructed a mechanistic framework for modelling amino acid substitution rates that uses the formalisms of statistical mechanics, with principles of population genetics underlying the analysis. Theoretical analyses and computer simulations of proteins under purifying selection for thermodynamic stability show that substitution rates and the stabilization of resident amino acids (the ‘evolutionary Stokes shift’) can be predicted from biophysics and the effect of sequence entropy alone. Furthermore, we demonstrate that substitutions predominantly occur when epistatic interactions result in near neutrality; substitution rates are determined by how often epistasis results in such nearly neutral conditions. This theory provides a general framework for modelling protein sequence change under purifying selection, potentially explains patterns of convergence and mutation rates in real proteins that are incompatible with previous models, and provides a better null model for the detection of adaptive changes.

This is a preview of subscription content

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Relative stabilities of amino acid pairs.
Fig. 2: Comparison of predicted and observed substitution rates.
Fig. 3: Example of a trajectory before and after a substitution from glutamic acid to lysine.
Fig. 4: Accuracy of site-specific stability and evolutionary Stokes shift predictions.

References

  1. Breen, M. S., Kemena, C., Vlasov, P. K., Notredame, C. & Kondrashov, F. A. Epistasis as the primary factor in molecular evolution. Nature 490, 535–538 (2012).

    CAS  Article  PubMed  Google Scholar 

  2. Usmanova, D. R., Ferretti, L., Povolotskaya, I. S., Vlasov, P. K. & Kondrashov, F. A. A model of substitution trajectories in sequence space and long-term protein evolution. Mol. Biol. Evol. 32, 542–554 (2015).

    Article  PubMed  Google Scholar 

  3. Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. Ashenberg, O., Gong, L. I. & Bloom, J. D. Mutational effects on stability are largely conserved during protein evolution. Proc. Natl Acad. Sci. USA 110, 21071–21076 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Gong, L. I. & Bloom, J. D. Epistatically interacting substitutions are enriched during adaptive protein evolution. PLoS Genet. 10, e1004328 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Pollock, D. D., Thiltgen, G. & Goldstein, R. A. Amino acid coevolution induces an evolutionary Stokes shift. Proc. Natl Acad. Sci. USA 109, E1352–E1359 (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. Pollock, D. D. & Goldstein, R. A. Strong evidence for protein epistasis, weak evidence against it. Proc. Natl Acad. Sci. USA 111, E1450 (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. Shah, P., McCandlish, D. M. & Plotkin, J. B. Contingency and entrenchment in protein evolution under purifying selection. Proc. Natl Acad. Sci. USA 112, E3226–E3235 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. Pollock, D. D., Taylor, W. R. & Goldman, N. Coevolving protein residues: maximum likelihood identification and relationship to structure. J. Mol. Biol. 287, 187–198 (1999).

    CAS  Article  PubMed  Google Scholar 

  10. Muse, S. V. & Gaut, B. S. A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. Mol. Biol. Evol. 11, 715–724 (1994).

    CAS  PubMed  Google Scholar 

  11. Nielsen, R. & Yang, Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148, 929–936 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Tamuri, A. U., dos Reis, M., Hay, A. J. & Goldstein, R. A. Identifying changes in selective constraints: host shifts in influenza. PLoS Comput. Biol. 5, e1000564 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Castoe, T. A. et al. Evidence for an ancient adaptive episode of convergent molecular evolution. Proc. Natl Acad. Sci. USA 106, 8986–8991 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. Goldstein, R. A., Pollard, S. T., Shah, S. D. & Pollock, D. D. Nonadaptive amino acid convergence rates decrease over time. Mol. Biol. Evol. 32, 1373–1381 (2015).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  15. Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky–Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  16. Halpern, A. L. & Bruno, W. J. Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. Mol. Biol. Evol. 15, 910–917 (1998).

    CAS  Article  PubMed  Google Scholar 

  17. Lartillot, N. & Philippe, H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109 (2004).

    CAS  Article  PubMed  Google Scholar 

  18. Tamuri, A. U., dos Reis, M. & Goldstein, R. A. Estimating the distribution of selection coefficients from phylogenetic data using sitewise mutation-selection models. Genetics 190, 1101–1115 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Tamuri, A. U., Goldman, N. & dos Reis, M. A penalized-likelihood method to estimate the distribution of selection coefficients from phylogenetic data. Genetics 197, 257–271 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  20. Rodrigue, N. On the statistical interpretation of site-specific variables in phylogeny-based substitution models. Genetics 193, 557–564 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Spielman, S. J. & Wilke, C. O. Extensively parameterized mutation-selection models reliably capture site-specific selective constraint. Mol. Biol. Evol. 33, 2990–3002 (2016).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. Goldstein, R. A. & Pollock, D. D. The tangled bank of amino acids. Protein. Sci. 25, 1354–1362 (2016).

  23. Kimura, M. The role of compensatory neutral mutations in molecular evolution. J. Genet. 64, 7 (1985).

  24. Goldstein, R. A. The evolution and evolutionary consequences of marginal thermostability in proteins. Proteins 79, 1396–1407 (2011).

    CAS  Article  PubMed  Google Scholar 

  25. Williams, P. D., Pollock, D. D., Blackburne, B. P. & Goldstein, R. A. Assessing the accuracy of ancestral protein reconstruction methods. PLoS Comput. Biol. 2, e69 (2006).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Privalov, P. L. Stability of proteins: small globular proteins. Adv. Protein. Chem. 33, 167–241 (1979).

    CAS  Article  PubMed  Google Scholar 

  27. Privalov, P. L. & Gill, S. J. Stability of protein-structure and hydrophoboc interaction. Adv. Protein. Chem. 39, 191–234 (1988).

    CAS  Article  PubMed  Google Scholar 

  28. Taverna, D. M. & Goldstein, R. A. Why are proteins marginally stable? Proteins 46, 105–109 (2002).

    CAS  Article  PubMed  Google Scholar 

  29. Zeldovich, K. B. & Shakhnovich, E. I. Understanding protein evolution: from protein physics to Darwinian selection. Annu. Rev. Phys. Chem. 59, 105–127 (2008).

    CAS  Article  PubMed  Google Scholar 

  30. Iwasa, Y. Free fitness that always increases in evolution. J. Theor. Biol. 135, 265–281 (1988).

    CAS  Article  PubMed  Google Scholar 

  31. Sella, G. & Hirsh, A. E. The application of statistical physics to evolutionary biology. Proc. Natl Acad. Sci. USA 102, 9541–9546 (2005).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. Shenkin, P. S., Erman, B. & Mastrandrea, L. D. Information-theoretical entropy as a measure of sequence variability. Proteins 11, 297–313 (1991).

    CAS  Article  PubMed  Google Scholar 

  33. Crow, J. F. & Kimura, M. An Introduction to Population Genetics Theory (Harper & Row, New York, 1970).

  34. Kimura, M. Some problems of stochastic processes in genetics. Ann. Math. Stat 28, 882–901 (1957).

    Article  Google Scholar 

  35. Kimura, M. On the probability of fixation of mutant genes in a population. Genetics 47, 713–719 (1962).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Goldstein, R. A. Population size dependence of fitness effect distribution and substitution rate probed by biophysical model of protein thermostability. Genome Biol. Evol. 5, 1584–1593 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Cherry, J. L. Should we expect substitution rate to depend on population size? Genetics 150, 911–919 (1998).

    CAS  PubMed  PubMed Central  Google Scholar 

  38. Eyring, H. The activated complex in chemical reactions. J. Chem. Phys. 3, 107–115 (1935).

    CAS  Article  Google Scholar 

  39. Fisher, R. The Genetic Theory of Natural Selection (Oxford Univ. Press, Oxford, 1930).

  40. Wylie, C. S. & Shakhnovich, E. I. A biophysical protein folding model accounts for most mutational fitness effects in viruses. Proc. Natl Acad. Sci. USA 108, 9916–9921 (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. Izaguirre, J. A. et al CompuCell, a multi-model framework for simulation of morphogenesis. Bioinformatics 20, 1129–1137 (2004).

    CAS  Article  PubMed  Google Scholar 

  42. Miyazawa, S. & Jernigan, R. L. Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation. Macromolecules 18, 534–552 (1985).

    CAS  Article  Google Scholar 

  43. Lindqvist, Y., Johansson, E., Kaija, H., Vihko, P. & Schneider, G. Three-dimensional structure of a mammalian purple acid phosphatase at 2.2 Å resolution with a mu-(hydr)oxo bridged di-iron center. J. Mol. Biol. 291, 135–147 (1999).

    CAS  Article  PubMed  Google Scholar 

  44. Gillespie, D. T. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361 (1977).

    CAS  Article  Google Scholar 

  45. Kimura, M. A simple method for estimating evolutionary rate of base substitution through comparative studies of nucleotide sequences. J. Mol. Evol. 16, 111–120 (1980).

    CAS  Article  PubMed  Google Scholar 

  46. Forgy, E. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–780 (1965).

    Google Scholar 

  47. Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 28, 129–137 (1982).

    Article  Google Scholar 

  48. Khatri, B. S. & Goldstein, R. A. A coarse-grained biophysical model of sequence evolution and the population size dependence of the speciation rate. J. Theor. Biol. 378, 56–64 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Khatri, B. S., McLeish, T. C. & Sear, R. P. Statistical mechanics of convergent evolution in spatial patterning. Proc. Natl Acad. Sci. USA 106, 9564–9569 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank B. Khatri for helpful discussions. We acknowledge the support of the Medical Research Council (UK) (MC_U117573805) and the Biotechnology and Biological Sciences Research Council (UK) (BB/P007562/1) to R.A.G. and the National Institutes of Health (NIH; GM083127 and GM097251) to D.D.P.

Author information

Authors and Affiliations

Authors

Contributions

R.A.G. and D.D.P. jointly designed the study, analysed the results and wrote the paper. R.A.G wrote the simulation software and performed all mathematical derivations.

Corresponding author

Correspondence to David D. Pollock.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Supplementary figures

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Goldstein, R.A., Pollock, D.D. Sequence entropy of folding and the absolute rate of amino acid substitutions. Nat Ecol Evol 1, 1923–1930 (2017). https://doi.org/10.1038/s41559-017-0338-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41559-017-0338-9

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing