Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Sequence space and the ongoing expansion of the protein universe


The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions1,2,3,4. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, 3.5 × 109 yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Figure 1: Expansion of the physical and the protein universes.
Figure 2: Measuring the rate of divergence of distant protein sequences.
Figure 3: The rate of expansion of the protein sequence universe.
Figure 4: Sequence space of two nucleotide sites.
Figure 5: Divergent and convergent evolution.


  1. Aravind, L., Mazumder, R., Vasudevan, S. & Koonin, E. V. Trends in protein evolution inferred from sequence and structure analysis. Curr. Opin. Struct. Biol. 12, 392–399 (2002)

    Article  CAS  Google Scholar 

  2. DePristo, M. A., Weinreich, D. M. & Hartl, D. L. Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Rev. Genet. 6, 678–687 (2005)

    Article  CAS  Google Scholar 

  3. Camps, M., Herman, A., Loh, E. & Loeb, L. A. Genetic constraints on protein evolution. Crit. Rev. Biochem. Mol. Biol. 42, 313–326 (2007)

    Article  CAS  Google Scholar 

  4. Tokuriki, N. & Tawfik, D. S. Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 19, 596–604 (2009)

    Article  CAS  Google Scholar 

  5. Mirkin, B. G., Fenner, T. I., Galperin, M. Y. & Koonin, E. V. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003)

    Article  Google Scholar 

  6. Koonin, E. V. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Rev. Microbiol. 1, 127–136 (2003)

    Article  CAS  Google Scholar 

  7. Ranea, J. A., Sillero, A., Thornton, J. M. & Orengo, C. A. Protein superfamily evolution and the last universal common ancestor (LUCA). J. Mol. Evol. 63, 513–525 (2006)

    Article  ADS  CAS  Google Scholar 

  8. Wright, S. in Proc. Sixth Int. Congr. Genet. Vol. 1 (ed. Jones, D. F.) 356–366 (Genetics Society of America, 1932)

    Google Scholar 

  9. Maynard Smith, J. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970)

    Article  ADS  Google Scholar 

  10. Kondrashov, F. A. & Kondrashov, A. S. Multidimensional epistasis and the disadvantage of sex. Proc. Natl Acad. Sci. USA 98, 12089–12092 (2001)

    Article  ADS  CAS  Google Scholar 

  11. Kondrashov, A. S., Sunyaev, S. & Kondrashov, F. A. Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002)

    Article  ADS  CAS  Google Scholar 

  12. Weinreich, D. M., Watson, R. A. & Chao, L. Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59, 1165–1174 (2005)

    CAS  PubMed  Google Scholar 

  13. Weinreich, D. M., Delaney, N. F., Depristo, M. A. & Hartl, D. L. Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006)

    Article  ADS  CAS  Google Scholar 

  14. Poelwijk, F. J., Kiviet, D. J., Weinreich, D. M. & Tans, S. J. Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445, 383–386 (2007)

    Article  ADS  CAS  Google Scholar 

  15. Koonin, E. V., Wolf, Y. I. & Karev, G. P. The structure of the protein universe and genome evolution. Nature 420, 218–223 (2002)

    Article  ADS  CAS  Google Scholar 

  16. Lesk, A. M. & Chothia, C. How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J. Mol. Biol. 136, 225–230 (1980)

    Article  CAS  Google Scholar 

  17. Bowie, J. U., Reidhaar-Olson, J. F., Lim, W. A. & Sauer, R. T. Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247, 1306–1310 (1990)

    Article  ADS  CAS  Google Scholar 

  18. Heger, A. & Holm, L. Towards a covering set of protein family profiles. Prog. Biophys. Mol. Biol. 73, 321–337 (2000)

    Article  CAS  Google Scholar 

  19. Taylor, S. V., Walter, K. U., Kast, P. & Hilvert, D. Searching sequence space for protein catalysts. Proc. Natl Acad. Sci. USA 98, 10596–10601 (2001)

    Article  ADS  CAS  Google Scholar 

  20. Guo, H. H., Choe, J. & Loeb, L. A. Protein tolerance to random amino acid change. Proc. Natl Acad. Sci. USA 101, 9205–9210 (2004)

    Article  ADS  CAS  Google Scholar 

  21. Huang, W., Petrosino, J., Hirsch, M., Shenkin, P. S. & Palzkill, T. Amino acid sequence determinants of beta-lactamase structure and activity. J. Mol. Biol. 258, 688–703 (1996)

    Article  CAS  Google Scholar 

  22. Holm, L. & Sander, C. Mapping the protein universe. Science 273, 595–602 (1996)

    Article  ADS  CAS  Google Scholar 

  23. Doolittle, W. F. The nature of the universal ancestor and the evolution of the proteome. Curr. Opin. Struct. Biol. 10, 355–358 (2000)

    Article  CAS  Google Scholar 

  24. Dokholyan, N. V., Shakhnovich, B. & Shakhnovich, E. I. Expanding protein universe and its origin from the biological Big Bang. Proc. Natl Acad. Sci. USA 99, 14132–14136 (2002)

    Article  ADS  CAS  Google Scholar 

  25. Hubble, E. A relation between distance and radial velocity among extra-galactic nebulae. Proc. Natl Acad. Sci. USA 15, 168–173 (1929)

    Article  ADS  CAS  Google Scholar 

  26. Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997)

    Article  ADS  CAS  Google Scholar 

  27. Golding, B. & Felsenstein, J. A maximum likelihood approach to the detection of selection from a phylogeny. J. Mol. Evol. 31, 511–523 (1990)

    Article  ADS  CAS  Google Scholar 

  28. Guzzo, L. et al. A test of the nature of cosmic acceleration using galaxy redshift distortions. Nature 451, 541–544 (2008)

    Article  ADS  CAS  Google Scholar 

  29. Kondrashov, A. S., Povolotskaya, I. S., Ivankov, D. N. & Kondrashov, F. A. Rate of sequence divergence under constant selection. Biol. Direct 5, 5 (2010)

    Article  Google Scholar 

  30. Jordan, I. K. et al. A universal trend of amino acid gain and loss in protein evolution. Nature 433, 633–638 (2005)

    Article  ADS  CAS  Google Scholar 

  31. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)

    Article  CAS  Google Scholar 

  32. Novichkov, P. S., Ratnere, I., Wolf, Y. I., Koonin, E. V. & Dubchak, I. ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes. Nucleic Acids Res. 37, D448–D454 (2009)

    Article  CAS  Google Scholar 

  33. Goldstein, R. A. & Pollock, D. D. Observations of amino acid gain and loss during protein evolution are explained by statistical bias. Mol. Biol. Evol. 23, 1444–1449 (2006)

    Article  CAS  Google Scholar 

  34. Ronquist, F. & Huelsenbeck, J. P. MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)

    Article  CAS  Google Scholar 

  35. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007)

    Article  CAS  Google Scholar 

  36. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009)

    Article  Google Scholar 

Download references


We thank E. Koonin, Y. Wolf, A. Lobkovsky, D. Petrov, D. Ivankov, J. Sharpe, B. Lehner, Y. Jaeger, P. Vlasov, M. Ptitsyn and M. Roytberg for discussions and A. Kondrashov for extensive feedback on our manuscript. We thank D. Tawfik for inspiring us to start the investigation of the functional limits in sequence space.

Author information

Authors and Affiliations



I.S.P. performed all analyses and obtained all of the data. F.A.K. conceived the study and drafted the manuscript. Both authors participated in the design of the analyses and the interpretation of the results.

Corresponding author

Correspondence to Fyodor A. Kondrashov.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Information

This file contains Supplementary Information comprising: Rationale for avoiding deep ancestral state reconstructions and Deconstructing the Nt and Na measurements, References and Supplementary Figures 1-8 with legends. (PDF 515 kb)

Supplementary Table 1

This table contains genomes that have not been used as quadruplets but were assigned to COGs that were present in LUCA. (PDF 302 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Cite this article

Povolotskaya, I., Kondrashov, F. Sequence space and the ongoing expansion of the protein universe. Nature 465, 922–926 (2010).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing