Letter | Published:

Alternative evolutionary histories in the sequence space of an ancient protein

Nature volume 549, pages 409413 (21 September 2017) | Download Citation

Abstract

To understand why molecular evolution turned out as it did, we must characterize not only the path that evolution followed across the space of possible molecular sequences but also the many alternative trajectories that could have been taken but were not. A large-scale comparison of real and possible histories would establish whether the outcome of evolution represents an optimal state driven by natural selection or the contingent product of historical chance events1; it would also reveal how the underlying distribution of functions across sequence space shaped historical evolution2,3. Here we combine ancestral protein reconstruction4 with deep mutational scanning5,6,7,8,9,10 to characterize alternative histories in the sequence space around an ancient transcription factor, which evolved a novel biological function through well-characterized mechanisms11,12. We find hundreds of alternative protein sequences that use diverse biochemical mechanisms to perform the derived function at least as well as the historical outcome. These alternatives all require prior permissive substitutions that do not enhance the derived function, but not all require the same permissive changes that occurred during history. We find that if evolution had begun from a different starting point within the network of sequences encoding the ancestral function, outcomes with different genetic and biochemical forms would probably have resulted; this contingency arises from the distribution of functional variants in sequence space and epistasis between residues. Our results illuminate the topology of the vast space of possibilities from which history sampled one path, highlighting how the outcome of evolution depends on a serial chain of compounding chance events.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

BioProject

References

  1. 1.

    Chance and Necessity: An Essay on the Natural Philosophy of Biology (Vintage Books, 1972)

  2. 2.

    Natural selection and the concept of a protein space. Nature 225, 563–564 (1970)

  3. 3.

    Neutralism and selectionism: a network-based reconciliation. Nat. Rev. Genet. 9, 965–974 (2008)

  4. 4.

    & Reconstructing ancient proteins to understand the causes of structure and function. Annu. Rev. Biophys. 46, 247–269 (2017)

  5. 5.

    et al. High-resolution mapping of protein sequence-function relationships. Nat. Methods 7, 741–746 (2010)

  6. 6.

    , & Experimental illumination of a fitness landscape. Proc. Natl Acad. Sci. USA 108, 7896–7901 (2011)

  7. 7.

    & Pervasive degeneracy and epistasis in a protein-protein interface. Science 347, 673–677 (2015)

  8. 8.

    , , , & Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife 5, e16965 (2016)

  9. 9.

    et al. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163, 594–606 (2015)

  10. 10.

    et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016)

  11. 11.

    et al. Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell 159, 58–68 (2014)

  12. 12.

    , & Intermolecular epistasis shaped the function and evolution of an ancient transcription factor and its DNA binding sites. eLife 4, e07864 (2015)

  13. 13.

    et al. Genome-wide analysis of estrogen receptor binding sites. Nat. Genet. 38, 1289–1297 (2006)

  14. 14.

    et al. The glucocorticoid receptor dimer interface allosterically transmits sequence-specific DNA signals. Nat. Struct. Mol. Biol. 20, 876–883 (2013)

  15. 15.

    et al. Crystallographic analysis of the interaction of the glucocorticoid receptor with DNA. Nature 352, 497–505 (1991)

  16. 16.

    , , & The crystal structure of the estrogen receptor DNA-binding domain bound to DNA: how receptors discriminate between their response elements. Cell 75, 567–578 (1993)

  17. 17.

    , , & Evolution of distinct DNA-binding specificities within the nuclear receptor family of transcription factors. Proc. Natl Acad. Sci. USA 91, 4175–4179 (1994)

  18. 18.

    et al. Glucocorticoid receptor-DNA interactions: binding energetics are the primary determinant of sequence-specific transcriptional activity. J. Mol. Biol. 422, 18–32 (2012)

  19. 19.

    , , , & Robustness of reconstructed ancestral protein functions to statistical uncertainty. Mol. Biol. Evol. 34, 247–261 (2017)

  20. 20.

    , & Permissive secondary mutations enable the evolution of influenza oseltamivir resistance. Science 328, 1272–1275 (2010)

  21. 21.

    , & Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2, e00631 (2013)

  22. 22.

    & Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat. Rev. Genet. 14, 559–571 (2013)

  23. 23.

    & Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016)

  24. 24.

    , , , & Experimental interrogation of the path dependence and stochasticity of protein evolution using phage-assisted continuous evolution. Proc. Natl Acad. Sci. USA 110, 9007–9012 (2013)

  25. 25.

    , , & Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548 (2007)

  26. 26.

    & Historical contingency and its biophysical basis in glucocorticoid receptor evolution. Nature 512, 203–207 (2014)

  27. 27.

    et al. Predictable convergence in hemoglobin function has unpredictable molecular underpinnings. Science 354, 336–339 (2016)

  28. 28.

    , & Contingency and entrenchment in protein evolution under purifying selection. Proc. Natl Acad. Sci. USA 112, E3226–E3235 (2015)

  29. 29.

    , & An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515–519 (2009)

  30. 30.

    & Evolutionary meandering of intermolecular interactions along the drift barrier. Proc. Natl Acad. Sci. USA 112, E30–E38 (2015)

  31. 31.

    , , & An evolvable oestrogen receptor activity sensor: development of a modular system for integrating multiple genes into the yeast genome. Yeast 24, 379–390 (2007)

  32. 32.

    , & Yeast vectors for the controlled expression of heterologous proteins in different genetic backgrounds. Gene 156, 119–122 (1995)

  33. 33.

    & Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol. 350, 87–96 (2002)

  34. 34.

    R Core Team. R: A language and environment for statistical computing (R Foundation for Statistical Computing, 2016)

  35. 35.

    R. segmented: an R package to fit regression models with broken-line relationships. R News 8, 20–25 (2008)

  36. 36.

    , , , & The nuclear receptor superfamily has undergone extensive proliferation and diversification in nematodes. Genome Res. 9, 103–120 (1999)

  37. 37.

    , , & An improved yeast transformation method for the generation of very large human antibody libraries. Protein Eng. Des. Sel. 23, 155–159 (2010)

  38. 38.

    , & Quantifying and resolving multiple vector transformants in S. cerevisiae plasmid libraries. BMC Biotechnol. 9, 95 (2009)

  39. 39.

    , & Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protocols 9, 2267–2284 (2014)

  40. 40.

    , , & Short barcodes for next generation sequencing. PLoS ONE 8, e82933 (2013)

  41. 41.

    & Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations. BMC Genomics 17, 206 (2016)

  42. 42.

    & fitdistrplus: an R package for fitting distributions. J. Stat. Softw. 64, (2015)

  43. 43.

    & L1 penalized continuation ratio models for ordinal response prediction using high-dimensional datasets. Stat. Med. 31, 1464–1474 (2012)

  44. 44.

    , & rgexf: build, import and export GEXF graph files. R package version 0.15.3. (2015)

  45. 45.

    ., & Gephi: an open source software for exploring and manipulating networks. In Int. AAAI Conference on Weblogs and Social Media , vol. 8, 361–362 (Association for the Advancement of Artificial Intelligence, 2009)

  46. 46.

    & The igraph software package for complex network research. InterJ. Complex Syst. 1695, 1–9 (2006)

  47. 47.

    & Detecting high-order epistasis in nonlinear genotype-phenotype maps. Genetics 205, 1079–1088 (2017)

  48. 48.

    , & The (mis)use of overlap of confidence intervals to assess effect modification. Eur. J. Epidemiol. 26, 253–254 (2011)

  49. 49.

    et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005)

  50. 50.

    , & NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. Nucleic Acids Res. 25, 4940–4945 (1997)

  51. 51.

    et al. Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc. Natl Acad. Sci. USA 102, 10147–10152 (2005)

  52. 52.

    , , & WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004)

  53. 53.

    , & How structural and physicochemical determinants shape sequence constraints in a functional enzyme. PLoS ONE 10, e0118684 (2015)

  54. 54.

    , , & Using the correct statistical test for the equality of regression coefficients. Criminology 36, 859–866 (1998)

Download references

Acknowledgements

We thank J. Bridgham and B. Metzger for technical advice, members of the Thornton laboratory past and present for comments, the University of Chicago’s Flow Cytometry and Genomics Cores, and E. Thomas for poetic inspiration. This work was supported by National Institutes of Health R01GM104397 and R01GM121931 (J.W.T.), T32-GM007183 (T.N.S.), UL1-TR000430, and a National Science Foundation Graduate Research Fellowship (T.N.S.).

Author information

Affiliations

  1. Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, USA

    • Tyler N. Starr
  2. Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA

    • Lora K. Picton
    •  & Joseph W. Thornton
  3. Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA

    • Joseph W. Thornton

Authors

  1. Search for Tyler N. Starr in:

  2. Search for Lora K. Picton in:

  3. Search for Joseph W. Thornton in:

Contributions

T.N.S. and J.W.T. conceived the project, designed experiments, and wrote the paper. L.K.P. and T.N.S. designed and constructed the reporter system. T.N.S. performed experiments and analysed data.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Joseph W. Thornton.

Reviewer Information Nature thanks J. Bloom, A. de Visser, and D. Weinreich for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Supplementary information

PDF files

  1. 1.

    Reporting Summary

Excel files

  1. 1.

    Supplementary Table

    This file contains a list of oligonucleotide sequences used in this study.

  2. 2.

    Supplementary Table

    This file contains a list of all RH sequences and their specificity classifications in each protein background as used in this study.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature23902

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.