Letter | Published:

Sequence space and the ongoing expansion of the protein universe

Nature volume 465, pages 922926 (17 June 2010) | Download Citation

Abstract

The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions1,2,3,4. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, 3.5 × 109 yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

References

  1. 1.

    , , & Trends in protein evolution inferred from sequence and structure analysis. Curr. Opin. Struct. Biol. 12, 392–399 (2002)

  2. 2.

    , & Missense meanderings in sequence space: a biophysical view of protein evolution. Nature Rev. Genet. 6, 678–687 (2005)

  3. 3.

    , , & Genetic constraints on protein evolution. Crit. Rev. Biochem. Mol. Biol. 42, 313–326 (2007)

  4. 4.

    & Stability effects of mutations and protein evolvability. Curr. Opin. Struct. Biol. 19, 596–604 (2009)

  5. 5.

    , , & Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3, 2 (2003)

  6. 6.

    Comparative genomics, minimal gene-sets and the last universal common ancestor. Nature Rev. Microbiol. 1, 127–136 (2003)

  7. 7.

    , , & Protein superfamily evolution and the last universal common ancestor (LUCA). J. Mol. Evol. 63, 513–525 (2006)

  8. 8.

    in Proc. Sixth Int. Congr. Genet. Vol. 1 (ed. Jones, D. F.) 356–366 (Genetics Society of America, 1932)

  9. 9.

    Natural selection and the concept of a protein space. Nature 225, 563–564 (1970)

  10. 10.

    & Multidimensional epistasis and the disadvantage of sex. Proc. Natl Acad. Sci. USA 98, 12089–12092 (2001)

  11. 11.

    , & Dobzhansky-Muller incompatibilities in protein evolution. Proc. Natl Acad. Sci. USA 99, 14878–14883 (2002)

  12. 12.

    , & Perspective: sign epistasis and genetic constraint on evolutionary trajectories. Evolution 59, 1165–1174 (2005)

  13. 13.

    , , & Darwinian evolution can follow only very few mutational paths to fitter proteins. Science 312, 111–114 (2006)

  14. 14.

    , , & Empirical fitness landscapes reveal accessible evolutionary paths. Nature 445, 383–386 (2007)

  15. 15.

    , & The structure of the protein universe and genome evolution. Nature 420, 218–223 (2002)

  16. 16.

    & How different amino acid sequences determine similar protein structures: the structure and evolutionary dynamics of the globins. J. Mol. Biol. 136, 225–230 (1980)

  17. 17.

    , , & Deciphering the message in protein sequences: tolerance to amino acid substitutions. Science 247, 1306–1310 (1990)

  18. 18.

    & Towards a covering set of protein family profiles. Prog. Biophys. Mol. Biol. 73, 321–337 (2000)

  19. 19.

    , , & Searching sequence space for protein catalysts. Proc. Natl Acad. Sci. USA 98, 10596–10601 (2001)

  20. 20.

    , & Protein tolerance to random amino acid change. Proc. Natl Acad. Sci. USA 101, 9205–9210 (2004)

  21. 21.

    , , , & Amino acid sequence determinants of beta-lactamase structure and activity. J. Mol. Biol. 258, 688–703 (1996)

  22. 22.

    & Mapping the protein universe. Science 273, 595–602 (1996)

  23. 23.

    The nature of the universal ancestor and the evolution of the proteome. Curr. Opin. Struct. Biol. 10, 355–358 (2000)

  24. 24.

    , & Expanding protein universe and its origin from the biological Big Bang. Proc. Natl Acad. Sci. USA 99, 14132–14136 (2002)

  25. 25.

    A relation between distance and radial velocity among extra-galactic nebulae. Proc. Natl Acad. Sci. USA 15, 168–173 (1929)

  26. 26.

    , & A genomic perspective on protein families. Science 278, 631–637 (1997)

  27. 27.

    & A maximum likelihood approach to the detection of selection from a phylogeny. J. Mol. Evol. 31, 511–523 (1990)

  28. 28.

    et al. A test of the nature of cosmic acceleration using galaxy redshift distortions. Nature 451, 541–544 (2008)

  29. 29.

    , , & Rate of sequence divergence under constant selection. Biol. Direct 5, 5 (2010)

  30. 30.

    et al. A universal trend of amino acid gain and loss in protein evolution. Nature 433, 633–638 (2005)

  31. 31.

    MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004)

  32. 32.

    , , , & ATGC: a database of orthologous genes from closely related prokaryotic genomes and a research platform for microevolution of prokaryotes. Nucleic Acids Res. 37, D448–D454 (2009)

  33. 33.

    & Observations of amino acid gain and loss during protein evolution are explained by statistical bias. Mol. Biol. Evol. 23, 1444–1449 (2006)

  34. 34.

    & MRBAYES 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574 (2003)

  35. 35.

    PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007)

  36. 36.

    , & trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009)

Download references

Acknowledgements

We thank E. Koonin, Y. Wolf, A. Lobkovsky, D. Petrov, D. Ivankov, J. Sharpe, B. Lehner, Y. Jaeger, P. Vlasov, M. Ptitsyn and M. Roytberg for discussions and A. Kondrashov for extensive feedback on our manuscript. We thank D. Tawfik for inspiring us to start the investigation of the functional limits in sequence space.

Author information

Affiliations

  1. Bioinformatics and Genomics Programme, Centre for Genomic Regulation, Calle Dr Aiguader 88, Barcelona Biomedical Research Park Building, 08003 Barcelona, Spain

    • Inna S. Povolotskaya
    •  & Fyodor A. Kondrashov

Authors

  1. Search for Inna S. Povolotskaya in:

  2. Search for Fyodor A. Kondrashov in:

Contributions

I.S.P. performed all analyses and obtained all of the data. F.A.K. conceived the study and drafted the manuscript. Both authors participated in the design of the analyses and the interpretation of the results.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Fyodor A. Kondrashov.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Information comprising: Rationale for avoiding deep ancestral state reconstructions and Deconstructing the Nt and Na measurements, References and Supplementary Figures 1-8 with legends.

  2. 2.

    Supplementary Table 1

    This table contains genomes that have not been used as quadruplets but were assigned to COGs that were present in LUCA.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature09105

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.