Abstract
The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions1,2,3,4. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: ∼98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, ∼3.5 × 109 yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
In silico characterization, molecular phylogeny, and expression profiling of genes encoding legume lectin-like proteins under various abiotic stresses in Arabidopsis thaliana
BMC Genomics Open Access 29 June 2022
-
Senescence and entrenchment in evolution of amino acid sites
Nature Communications Open Access 14 September 2020
-
Major antigenic site B of human influenza H3N2 viruses has an evolving local fitness landscape
Nature Communications Open Access 06 March 2020
Access options
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout





References
Aravind, L., Mazumder, R., Vasudevan, S. & Koonin, E. V. Trends in protein evolution inferred from sequence and structure analysis. Curr. Opin. Struct. Biol. 12, 392–399 (2002)
DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Rev. Genet. 6, 678–687 (2005)
Camps, M., Herman, A., Loh, E. & Loeb, L. A. Genetic constraints on protein evolution. Crit. Rev. Biochem. Mol. Biol. 42, 313–326 (2007)
Tokuriki, N. & Tawfik, D. S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 19, 596–604 (2009)
Mirkin, B. G., Fenner, T. I., Galperin, M. Y. & Koonin, E. V. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003)
Koonin, E. V. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Rev. Microbiol. 1, 127–136 (2003)
Ranea, J. A., Sillero, A., Thornton, J. M. & Orengo, C. A. Protein superfamily evolution and the last universal common ancestor (LUCA). J. Mol. Evol. 63, 513–525 (2006)
Wright, S. in Proc. Sixth Int. Congr. Genet. Vol. 1 (ed. Jones, D. F.) 356–366 (Genetics Society of America, 1932)
Maynard Smith, J. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970)
Kondrashov, F. A. & Kondrashov, A. S. Multidimensional epistasis and the disadvantage of sex. Proc. Natl Acad. Sci. USA 98, 12089–12092 (2001)
Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002)
Weinreich, D. M., Watson, R. A. & Chao, L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59, 1165–1174 (2005)
Weinreich, D. M., Delaney, N. F., Depristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006)
Poelwijk, F. J., Kiviet, D. J., Weinreich, D. M. & Tans, S. J. Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445, 383–386 (2007)
Koonin, E. V., Wolf, Y. I. & Karev, G. P. The structure of the protein universe and genome evolution. Nature 420, 218–223 (2002)
Lesk, A. M. & Chothia, C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J. Mol. Biol. 136, 225–230 (1980)
Bowie, J. U., Reidhaar-Olson, J. F., Lim, W. A. & Sauer, R. T. Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247, 1306–1310 (1990)
Heger, A. & Holm, L. Towards a covering set of protein family profiles. Prog. Biophys. Mol. Biol. 73, 321–337 (2000)
Taylor, S. V., Walter, K. U., Kast, P. & Hilvert, D. Searching sequence space for protein catalysts. Proc. Natl Acad. Sci. USA 98, 10596–10601 (2001)
Guo, H. H., Choe, J. & Loeb, L. A. Protein tolerance to random amino acid change. Proc. Natl Acad. Sci. USA 101, 9205–9210 (2004)
Huang, W., Petrosino, J., Hirsch, M., Shenkin, P. S. & Palzkill, T. Amino acid sequence determinants of beta-lactamase structure and activity. J. Mol. Biol. 258, 688–703 (1996)
Holm, L. & Sander, C. Mapping the protein universe. Science 273, 595–602 (1996)
Doolittle, W. F. The nature of the universal ancestor and the evolution of the proteome. Curr. Opin. Struct. Biol. 10, 355–358 (2000)
Dokholyan, N. V., Shakhnovich, B. & Shakhnovich, E. I. Expanding protein universe and its origin from the biological Big Bang. Proc. Natl Acad. Sci. USA 99, 14132–14136 (2002)
Hubble, E. A relation between distance and radial velocity among extra-galactic nebulae. Proc. Natl Acad. Sci. USA 15, 168–173 (1929)
Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997)
Golding, B. & Felsenstein, J. A maximum likelihood approach to the detection of selection from a phylogeny. J. Mol. Evol. 31, 511–523 (1990)
Guzzo, L. et al. A test of the nature of cosmic acceleration using galaxy redshift distortions. Nature 451, 541–544 (2008)
Kondrashov, A. S., Povolotskaya, I. S., Ivankov, D. N. & Kondrashov, F. A. Rate of sequence divergence under constant selection. Biol. Direct 5, 5 (2010)
Jordan, I. K. et al. A universal trend of amino acid gain and loss in protein evolution. Nature 433, 633–638 (2005)
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)
Novichkov, P. S., Ratnere, I., Wolf, Y. I., Koonin, E. V. & Dubchak, I. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes. Nucleic Acids Res. 37, D448–D454 (2009)
Goldstein, R. A. & Pollock, D. D. Observations of amino acid gain and loss during protein evolution are explained by statistical bias. Mol. Biol. Evol. 23, 1444–1449 (2006)
Ronquist, F. & Huelsenbeck, J. P. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)
Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007)
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009)
Acknowledgements
We thank E. Koonin, Y. Wolf, A. Lobkovsky, D. Petrov, D. Ivankov, J. Sharpe, B. Lehner, Y. Jaeger, P. Vlasov, M. Ptitsyn and M. Roytberg for discussions and A. Kondrashov for extensive feedback on our manuscript. We thank D. Tawfik for inspiring us to start the investigation of the functional limits in sequence space.
Author information
Authors and Affiliations
Contributions
I.S.P. performed all analyses and obtained all of the data. F.A.K. conceived the study and drafted the manuscript. Both authors participated in the design of the analyses and the interpretation of the results.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Information
This file contains Supplementary Information comprising: Rationale for avoiding deep ancestral state reconstructions and Deconstructing the Nt and Na measurements, References and Supplementary Figures 1-8 with legends. (PDF 515 kb)
Supplementary Table 1
This table contains genomes that have not been used as quadruplets but were assigned to COGs that were present in LUCA. (PDF 302 kb)
Rights and permissions
About this article
Cite this article
Povolotskaya, I., Kondrashov, F. Sequence space and the ongoing expansion of the protein universe. Nature 465, 922–926 (2010). https://doi.org/10.1038/nature09105
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nature09105
This article is cited by
-
In silico characterization, molecular phylogeny, and expression profiling of genes encoding legume lectin-like proteins under various abiotic stresses in Arabidopsis thaliana
BMC Genomics (2022)
-
Low-N protein engineering with data-efficient deep learning
Nature Methods (2021)
-
Enigmatic persistence of dissolved organic matter in the ocean
Nature Reviews Earth & Environment (2021)
-
Major antigenic site B of human influenza H3N2 viruses has an evolving local fitness landscape
Nature Communications (2020)
-
Senescence and entrenchment in evolution of amino acid sites
Nature Communications (2020)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.