Alternative evolutionary histories in the sequence space of an ancient protein

Starr, Tyler N.; Picton, Lora K.; Thornton, Joseph W.

doi:10.1038/nature23902

Letter
Published: 13 September 2017

Alternative evolutionary histories in the sequence space of an ancient protein

Tyler N. Starr¹,
Lora K. Picton² &
Joseph W. Thornton^2,3

Nature volume 549, pages 409–413 (2017)Cite this article

19k Accesses
110 Citations
306 Altmetric
Metrics details

Subjects

Abstract

To understand why molecular evolution turned out as it did, we must characterize not only the path that evolution followed across the space of possible molecular sequences but also the many alternative trajectories that could have been taken but were not. A large-scale comparison of real and possible histories would establish whether the outcome of evolution represents an optimal state driven by natural selection or the contingent product of historical chance events¹; it would also reveal how the underlying distribution of functions across sequence space shaped historical evolution^2,3. Here we combine ancestral protein reconstruction⁴ with deep mutational scanning^5,6,7,8,9,10 to characterize alternative histories in the sequence space around an ancient transcription factor, which evolved a novel biological function through well-characterized mechanisms^11,12. We find hundreds of alternative protein sequences that use diverse biochemical mechanisms to perform the derived function at least as well as the historical outcome. These alternatives all require prior permissive substitutions that do not enhance the derived function, but not all require the same permissive changes that occurred during history. We find that if evolution had begun from a different starting point within the network of sequences encoding the ancestral function, outcomes with different genetic and biochemical forms would probably have resulted; this contingency arises from the distribution of functional variants in sequence space and epistasis between residues. Our results illuminate the topology of the vast space of possibilities from which history sampled one path, highlighting how the outcome of evolution depends on a serial chain of compounding chance events.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Diverse sequences and mechanisms can yield the derived DNA specificity.**

**Figure 2: Evolvability of SRE specificity in an ancestral sequence space.**

**Figure 3: Historical permissive substitutions enhanced evolvability of SRE specificity.**

**Figure 4: Effect of historical permissive substitutions is mediated by non-specific increases in affinity.**

Emergence of fractal geometries in the evolution of a metabolic enzyme

Article Open access 10 April 2024

Franziska L. Sendker, Yat Kei Lo, … Georg K. A. Hochberg

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

John Jumper, Richard Evans, … Demis Hassabis

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Roman Sarrazin-Gendron, Parham Ghasemloo Gheidari, … Jérôme Waldispühl

Accession codes

Primary accessions

BioProject

PRJNA362734

References

Monod, J. Chance and Necessity: An Essay on the Natural Philosophy of Biology (Vintage Books, 1972)
Maynard Smith, J. Natural selection and the concept of a protein space. Nature 225, 563–564 (1970)
Article ADS Google Scholar
Wagner, A. Neutralism and selectionism: a network-based reconciliation. Nat. Rev. Genet. 9, 965–974 (2008)
Article CAS PubMed Google Scholar
Hochberg, G. K. A. & Thornton, J. W. Reconstructing ancient proteins to understand the causes of structure and function. Annu. Rev. Biophys. 46, 247–269 (2017)
Article CAS PubMed PubMed Central Google Scholar
Fowler, D. M. et al. High-resolution mapping of protein sequence-function relationships. Nat. Methods 7, 741–746 (2010)
Article CAS PubMed PubMed Central Google Scholar
Hietpas, R. T., Jensen, J. D. & Bolon, D. N. A. Experimental illumination of a fitness landscape. Proc. Natl Acad. Sci. USA 108, 7896–7901 (2011)
Article ADS PubMed PubMed Central Google Scholar
Podgornaia, A. I. & Laub, M. T. Pervasive degeneracy and epistasis in a protein-protein interface. Science 347, 673–677 (2015)
Article ADS CAS PubMed Google Scholar
Wu, N. C., Dai, L., Olson, C. A., Lloyd-Smith, J. O. & Sun, R. Adaptation in protein fitness landscapes is facilitated by indirect paths. eLife 5, e16965 (2016)
Article CAS PubMed PubMed Central Google Scholar
Aakre, C. D. et al. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163, 594–606 (2015)
Article CAS PubMed PubMed Central Google Scholar
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016)
Article ADS CAS PubMed PubMed Central Google Scholar
McKeown, A. N. et al. Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell 159, 58–68 (2014)
Article CAS PubMed PubMed Central Google Scholar
Anderson, D. W., McKeown, A. N. & Thornton, J. W. Intermolecular epistasis shaped the function and evolution of an ancient transcription factor and its DNA binding sites. eLife 4, e07864 (2015)
Article CAS PubMed PubMed Central Google Scholar
Carroll, J. S. et al. Genome-wide analysis of estrogen receptor binding sites. Nat. Genet. 38, 1289–1297 (2006)
Article CAS PubMed Google Scholar
Watson, L. C. et al. The glucocorticoid receptor dimer interface allosterically transmits sequence-specific DNA signals. Nat. Struct. Mol. Biol. 20, 876–883 (2013)
Article CAS PubMed PubMed Central Google Scholar
Luisi, B. F. et al. Crystallographic analysis of the interaction of the glucocorticoid receptor with DNA. Nature 352, 497–505 (1991)
Article ADS CAS PubMed Google Scholar
Schwabe, J. W., Chapman, L., Finch, J. T. & Rhodes, D. The crystal structure of the estrogen receptor DNA-binding domain bound to DNA: how receptors discriminate between their response elements. Cell 75, 567–578 (1993)
Article CAS PubMed Google Scholar
Zilliacus, J., Carlstedt-Duke, J., Gustafsson, J. A. & Wright, A. P. Evolution of distinct DNA-binding specificities within the nuclear receptor family of transcription factors. Proc. Natl Acad. Sci. USA 91, 4175–4179 (1994)
Article ADS CAS PubMed PubMed Central Google Scholar
Bain, D. L. et al. Glucocorticoid receptor-DNA interactions: binding energetics are the primary determinant of sequence-specific transcriptional activity. J. Mol. Biol. 422, 18–32 (2012)
Article CAS PubMed Google Scholar
Eick, G. N., Bridgham, J. T., Anderson, D. P., Harms, M. J. & Thornton, J. W. Robustness of reconstructed ancestral protein functions to statistical uncertainty. Mol. Biol. Evol. 34, 247–261 (2017)
CAS PubMed Google Scholar
Bloom, J. D., Gong, L. I. & Baltimore, D. Permissive secondary mutations enable the evolution of influenza oseltamivir resistance. Science 328, 1272–1275 (2010)
Article ADS CAS PubMed PubMed Central Google Scholar
Gong, L. I., Suchard, M. A. & Bloom, J. D. Stability-mediated epistasis constrains the evolution of an influenza protein. eLife 2, e00631 (2013)
Article PubMed PubMed Central Google Scholar
Harms, M. J. & Thornton, J. W. Evolutionary biochemistry: revealing the historical and physical causes of protein properties. Nat. Rev. Genet. 14, 559–571 (2013)
Article CAS PubMed PubMed Central Google Scholar
Starr, T. N. & Thornton, J. W. Epistasis in protein evolution. Protein Sci. 25, 1204–1218 (2016)
Article CAS PubMed PubMed Central Google Scholar
Dickinson, B. C., Leconte, A. M., Allen, B., Esvelt, K. M. & Liu, D. R. Experimental interrogation of the path dependence and stochasticity of protein evolution using phage-assisted continuous evolution. Proc. Natl Acad. Sci. USA 110, 9007–9012 (2013)
Article ADS PubMed PubMed Central Google Scholar
Ortlund, E. A., Bridgham, J. T., Redinbo, M. R. & Thornton, J. W. Crystal structure of an ancient protein: evolution by conformational epistasis. Science 317, 1544–1548 (2007)
Article ADS CAS PubMed PubMed Central Google Scholar
Harms, M. J. & Thornton, J. W. Historical contingency and its biophysical basis in glucocorticoid receptor evolution. Nature 512, 203–207 (2014)
Article ADS CAS PubMed PubMed Central Google Scholar
Natarajan, C. et al. Predictable convergence in hemoglobin function has unpredictable molecular underpinnings. Science 354, 336–339 (2016)
Article ADS CAS PubMed PubMed Central Google Scholar
Shah, P., McCandlish, D. M. & Plotkin, J. B. Contingency and entrenchment in protein evolution under purifying selection. Proc. Natl Acad. Sci. USA 112, E3226–E3235 (2015)
Article ADS CAS PubMed PubMed Central Google Scholar
Bridgham, J. T., Ortlund, E. A. & Thornton, J. W. An epistatic ratchet constrains the direction of glucocorticoid receptor evolution. Nature 461, 515–519 (2009)
Article ADS CAS PubMed PubMed Central Google Scholar
Lynch, M. & Hagner, K. Evolutionary meandering of intermolecular interactions along the drift barrier. Proc. Natl Acad. Sci. USA 112, E30–E38 (2015)
Article ADS CAS PubMed Google Scholar
Fox, J. E., Bridgham, J. T., Bovee, T. F. H. & Thornton, J. W. An evolvable oestrogen receptor activity sensor: development of a modular system for integrating multiple genes into the yeast genome. Yeast 24, 379–390 (2007)
Article CAS PubMed Google Scholar
Mumberg, D., Müller, R. & Funk, M. Yeast vectors for the controlled expression of heterologous proteins in different genetic backgrounds. Gene 156, 119–122 (1995)
Article CAS PubMed Google Scholar
Gietz, R. D. & Woods, R. A. Transformation of yeast by lithium acetate/single-stranded carrier DNA/polyethylene glycol method. Methods Enzymol. 350, 87–96 (2002)
Article CAS PubMed Google Scholar
R Core Team. R: A language and environment for statistical computing (R Foundation for Statistical Computing, 2016)
Muggeo, V. M. R. segmented: an R package to fit regression models with broken-line relationships. R News 8, 20–25 (2008)
Google Scholar
Sluder, A. E., Mathews, S. W., Hough, D., Yin, V. P. & Maina, C. V. The nuclear receptor superfamily has undergone extensive proliferation and diversification in nematodes. Genome Res. 9, 103–120 (1999)
CAS PubMed Google Scholar
Benatuil, L., Perez, J. M., Belk, J. & Hsieh, C. M. An improved yeast transformation method for the generation of very large human antibody libraries. Protein Eng. Des. Sel. 23, 155–159 (2010)
Article CAS PubMed Google Scholar
Scanlon, T. C., Gray, E. C. & Griswold, K. E. Quantifying and resolving multiple vector transformants in S. cerevisiae plasmid libraries. BMC Biotechnol. 9, 95 (2009)
Article CAS PubMed PubMed Central Google Scholar
Fowler, D. M., Stephany, J. J. & Fields, S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat. Protocols 9, 2267–2284 (2014)
Article CAS PubMed Google Scholar
Mir, K., Neuhaus, K., Bossert, M. & Schober, S. Short barcodes for next generation sequencing. PLoS ONE 8, e82933 (2013)
Article ADS CAS PubMed PubMed Central Google Scholar
Peterman, N. & Levine, E. Sort-seq under the hood: implications of design choices on large-scale characterization of sequence-function relations. BMC Genomics 17, 206 (2016)
Article CAS PubMed PubMed Central Google Scholar
Delignette-Muller, M. L. & Dutang, C. fitdistrplus: an R package for fitting distributions. J. Stat. Softw. 64, http://dx.doi.org/10.18637/jss.v064.i04 (2015)
Archer, K. J. & Williams, A. A. A. L1 penalized continuation ratio models for ordinal response prediction using high-dimensional datasets. Stat. Med. 31, 1464–1474 (2012)
Article MathSciNet CAS PubMed PubMed Central Google Scholar
Vega Yon, J., Fábrega Lacoa, J. & Kunst, J. B. rgexf: build, import and export GEXF graph files. R package version 0.15.3. https://CRAN.R-project.org/package=rgexf (2015)
Bastian, M ., Heymann, S. & Jacomy, M. Gephi: an open source software for exploring and manipulating networks. In Int. AAAI Conference on Weblogs and Social Media, vol. 8, 361–362 (Association for the Advancement of Artificial Intelligence, 2009)
Google Scholar
Csardi, G. & Nepusz, T. The igraph software package for complex network research. InterJ. Complex Syst. 1695, 1–9 (2006)
Google Scholar
Sailer, Z. R. & Harms, M. J. Detecting high-order epistasis in nonlinear genotype-phenotype maps. Genetics 205, 1079–1088 (2017)
Article CAS PubMed PubMed Central Google Scholar
Knol, M. J., Pestman, W. R. & Grobbee, D. E. The (mis)use of overlap of confidence intervals to assess effect modification. Eur. J. Epidemiol. 26, 253–254 (2011)
Article PubMed PubMed Central Google Scholar
Schymkowitz, J. et al. The FoldX web server: an online force field. Nucleic Acids Res. 33, W382–W388 (2005)
Article CAS PubMed PubMed Central Google Scholar
Luscombe, N. M., Laskowski, R. A. & Thornton, J. M. NUCPLOT: a program to generate schematic diagrams of protein-nucleic acid interactions. Nucleic Acids Res. 25, 4940–4945 (1997)
Article CAS PubMed PubMed Central Google Scholar
Schymkowitz, J. W. H. et al. Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc. Natl Acad. Sci. USA 102, 10147–10152 (2005)
Article ADS CAS PubMed PubMed Central Google Scholar
Crooks, G. E., Hon, G., Chandonia, J.-M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004)
Article CAS PubMed PubMed Central Google Scholar
Abriata, L. A., Palzkill, T. & Dal Peraro, M. How structural and physicochemical determinants shape sequence constraints in a functional enzyme. PLoS ONE 10, e0118684 (2015)
Article CAS PubMed PubMed Central Google Scholar
Paternoster, R., Brame, R., Mazerolle, P. & Piquero, A. Using the correct statistical test for the equality of regression coefficients. Criminology 36, 859–866 (1998)
Article Google Scholar

Download references

Acknowledgements

We thank J. Bridgham and B. Metzger for technical advice, members of the Thornton laboratory past and present for comments, the University of Chicago’s Flow Cytometry and Genomics Cores, and E. Thomas for poetic inspiration. This work was supported by National Institutes of Health R01GM104397 and R01GM121931 (J.W.T.), T32-GM007183 (T.N.S.), UL1-TR000430, and a National Science Foundation Graduate Research Fellowship (T.N.S.).

Author information

Authors and Affiliations

Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, 60637, Illinois, USA
Tyler N. Starr
Department of Ecology and Evolution, University of Chicago, Chicago, 60637, Illinois, USA
Lora K. Picton & Joseph W. Thornton
Department of Human Genetics, University of Chicago, Chicago, 60637, Illinois, USA
Joseph W. Thornton

Authors

Tyler N. Starr
View author publications
You can also search for this author in PubMed Google Scholar
Lora K. Picton
View author publications
You can also search for this author in PubMed Google Scholar
Joseph W. Thornton
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.N.S. and J.W.T. conceived the project, designed experiments, and wrote the paper. L.K.P. and T.N.S. designed and constructed the reporter system. T.N.S. performed experiments and analysed data.

Corresponding author

Correspondence to Joseph W. Thornton.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Reviewer Information Nature thanks J. Bloom, A. de Visser, and D. Weinreich for their contribution to the peer review of this work.

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Figure 1 Design and validation of a yeast FACS-seq assay for steroid receptor DNA-binding function.

a, GFP activation in ERE (purple) and SRE (green) yeast reporters correlates with previously measured protein–DNA binding affinity^11,12. Asterisk, stop-codon-containing variant. Dashed line, best fit segmented-linear relationship between GFP activation and log₁₀(K_a,mac) b, Histogram of the per-cell green fluorescence for AncSR1 on ERE measured via flow cytometry, fitted to a logistic distribution (dashed line). c, Distributions providing the best fit to flow cytometry data for isogenic cultures of 101 DBD variants, using Akaike information criterion. d, Comparisons of mean fluorescence estimates between FACS-seq replicates of each protein/response element combination. Black points, coding RH variants; light grey, stop-codon-containing variants. R²_pos, squared Pearson correlation coefficient for variants with mean fluorescence significantly higher than stop-codon-containing variants in either or both replicates. e, Comparisons between mean fluorescence as determined in FACS-seq and via flow cytometry analysis of isogenic cultures for a random selection of clones from each library. Dashed line, best-fit linear regression. f, Robustness of classification to sampling depth. Variants were binned according to the minimum number of cells with which they were sampled in either replicate. Below 15 cells sampled (dashed line), the probability that a variant called active in one replicate was also called active in the other is dependent on sampling depth; to minimize errors due to sampling depth, we eliminated as ‘undetermined’ all variants with fewer than 15 cells sampled after pooling replicates. g, Standard error of mean fluorescence estimates (s.e.m.) in each library as a function of sampling depth. Top: for each background, the relationship between s.e.m. and sampling depth for ERE (purple) and SRE (green) libraries, as estimated from the sampling distribution of stop-codon-containing variants (dotted lines) or variability in mean fluorescence estimates between replicates (solid lines). Bottom: the cumulative fraction of coding variants in each library having a certain number of cells sampled in the pooled data.

Extended Data Figure 2 Representative FACS gates for library sorting.

a, A scatterplot of side-angle scattering (SSC-A) and forward-angle scattering (FSC-A) selects for a homogenous cell population (P1). b, A scatterplot of the height of the per-cell forward scatter peak (FSC-H) and the integrated area of this peak (FSC-A) excludes events where multiple cells pass through the detector simultaneously (P2). c, Final sort bins (P3–P6) are drawn on the distribution of green fluorescence (FITC-A). d, Table showing the hierarchical parentage of sort gates and the percentage of events that fall in each bin.

Extended Data Figure 3 Models to predict the function of missing genotypes.

For each protein/response element combination, a continuation ratio ordinal logistic regression model was constructed to predict the functional class of a variant as a function of its four RH amino-acid states, including possible first-order main effects and second-order pairwise epistatic effects. Tenfold cross-validation was used to select the penalization parameter λ and evaluate performance. a, b, True positive rate (left, TPR, the proportion of experimental positives that are predicted positive) and positive predictive value (right, PPV, the proportion of predicted positives that are experimentally positive) are shown as a function of λ for AncSR1+11P on ERE. Classifications were evaluated for (a) all active (weak and strong) versus inactive variants and (b) strong active versus weak active and inactive variants. Grey dotted lines, cross-validation replicates; solid line, mean. Dashed line shows the chosen value of λ = 10⁻⁵; as λ continues to decrease beyond λ = 10⁻⁵, the true positive rate plateaus but positive predictive value continues to decline. c, The number of non-zero parameters included in each model as a function of λ. Dashed line, λ = 10⁻⁵. d, Summary of performance metrics from tenfold cross-validation for each model with λ = 10⁻⁵. Accuracy is the proportion of predicted classifications (strong, weak, and inactive) that match their experimentally determined classes.

Extended Data Figure 4 Biophysical diversity in DNA recognition.

a, b, Diverse mechanisms for recognition of SRE (a) or ERE (b) by the historical RH genotypes (GSKV and EGKA) and alternative SRE-specific variants. Contacts from FoldX-generated structural models are shown between RH residues (circles) and DNA bases (letters), backbone phosphates (small circles) and sugars (pentagons, numbered by position in the DNA motif; dashed numbers refer to the complementary strand). Hydrogen bonds are shown as dashed arrows from donor to acceptor; dotted lines, non-bonded contacts. Red squares, bases that form hydrogen bonds in the EGKA-ERE structure that are unsatisfied in complex with an SRE-specific RH; red circles, side chains with polar groups that are not satisfied in complex with ERE. Only DNA contacts that vary among the analysed structures are shown. c, Large side chains at position 29 correlate with the loss of a conserved R33 hydrogen bond to ERE. For ERE-bound structural models, the distance of the Arg33 guanidinium hydrogen to the ERE T4 carbonyl oxygen was measured and compared with the atomic volume of the residue at position 29 in that variant.

Extended Data Figure 5 The ancestral RH (EGKA) and derived RH (GSKV) can access many SRE-specific outcomes by short paths in AncSR1+11P.

a, Concentric rings contain RH genotypes of minimum path length one, two, or three steps from AncSR1+11P:EGKA (centre). The historical outcome (GSKV, boxed, bottom) is accessible through a three-step path (EGKA–GGKA–GGKV–GSKV). Alternative SRE-specific outcomes accessible in three or fewer steps are in green. Lines connect genotypes separated by a single non-synonymous nucleotide mutation; lines among genotypes in the outer ring are not shown for clarity. Orange arrows indicate paths of significantly increasing SRE mean fluorescence. b, For trajectories indicated by orange arrows in a, SRE mean fluorescence is shown versus mutational distance from AncSR1+11P:EGKA (with x-axis jitter to avoid overplotting). Grey lines connect variants separated by single-nucleotide mutations. Error bars, 90% confidence intervals. Green dashed line, activity of AncSR1+11P:GSKV on SRE. c, For the SRE-specific outcomes accessed in orange paths in a, the probability of each outcome under models where the probability of taking a step depends on the relative increase in SRE mean fluorescence (correlated fixation model), or where any SRE-enhancing step is equally likely (equal fixation model)⁸. d, The historical outcome (GSKV) has SRE-specific single-mutant neighbours. Concentric rings contain SRE-specific RH genotypes of path length one or two steps from AncSR1+11P:GSKV (centre). Lines connect genotypes separated by a single non-synonymous nucleotide mutation; lines among genotypes in the outer ring are not shown for clarity. e, The distribution of SRE mean fluorescence of SRE-specific neighbours of AncSR1+11P:GSKV illustrated in d. Error bars, 90% confidence intervals.

Extended Data Figure 6 Evolvability of SRE specificity in an ancestral sequence space.

a, Alternative ERE-specific starting points reach SRE-specific outcomes with very different amino-acid states. For each starting point accessing at least 15 outcomes (the median of all starting points), the frequency profile of amino-acid states at each RH site was determined for the set of SRE-specific outcomes reached in three or fewer steps; for each pair of starting points, the Jensen–Shannon (J–S) distance between profiles was calculated. Blue curve, distribution of pairs of starting points by J–S distances of the outcomes they reach; grey, distribution of J–S distances between profiles for randomly sampled sets of SRE-specific variants. In each modal peak, the amino-acid frequency profiles for outcomes reached by a representative pair of ERE-specific starting points are shown. b–d, Contingency in the accessibility of individual SRE-specific outcomes remains when path lengths longer than the historical trajectory are considered. Plots are equivalent to Fig. 2b–d but for trajectories of increasing length.

Extended Data Figure 7 The historical starting point cannot access the derived function without permissive mutations.

a, AncSR1 RH functional network layout as in Fig. 3c, with the shortest paths from AncSR1:EGKA to SRE specificity highlighted. The ancestral RH (EGKA) can access SRE specificity. However, all trajectories are at least five steps long, require permissive RH changes that confer no SRE activity (for example, K28R and G26A), and proceed through promiscuous intermediates. b, For paths highlighted in a, SRE mean fluorescence is shown versus mutational distance from AncSR1:EGKA; grey lines connect variants separated by single-nucleotide mutations. Error bars, 90% confidence intervals. Green dashed line, activity of AncSR1+11P:GSKV on SRE. AncSR1:EGKA was represented by only seven cells in the SRE library, so its FACS-seq SRE mean fluorescence estimate is unreliable (and its classification was thus inferred by the predictive model). In isolated flow cytometry experiments, its SRE mean fluorescence was indistinguishable from null alleles; the decrease in SRE mean fluorescence from step 0 to step 1 suggested by this figure is therefore more probably a flat line (no change in SRE activity). c, Stochasticity and contingency in trajectories of functional change. Diagrams illustrate paths from a purple starting point (left) to possible green outcomes (right). In a deterministic trajectory (i), a particular genotype encoding the green function will evolve deterministically if selection favours acquisition of the green function and only that genotype is accessible. The outcome of evolution is stochastic (ii) if multiple outcomes are accessible, so which one occurs is random. An outcome is contingent (iii) if its accessibility depends on the prior occurrence of some step that cannot be driven by selection for that outcome. Contingency and stochasticity can occur independently (ii and iii), or they can co-occur in serial (iv).

Extended Data Figure 8 The effect of historical permissive substitutions is mediated by non-specific increases in affinity.

a–d, The 11P substitutions non-specifically increase transcriptional activity as measured by FACS-seq, consistent with FoldX predictions of effects on binding affinity. a, Classification of SRE-specific variants as 11P-dependent (orange) and 11P-independent (yellow) on the basis of their functions in AncSR1 and AncSR1+11P backgrounds. Icons for individual variants specifically assessed in b and c are shown. b, FACS-seq mean fluorescence estimates for 11P-dependent (orange) and 11P-independent (yellow) RH variants in the AncSR1 (left) and AncSR1+11P (right) backgrounds, shown as box-and-whisker plots as in Fig. 4a. Icons represent variants validated in c. P values, Wilcoxon rank-sum test with continuity correction. The mean fluorescence of 11P-independent genotypes is significantly higher in the AncSR1 background but not in AncSR+11P. c, Validation of apparently restrictive effect of 11P on some genotypes. For three variants non-functional in AncSR1+11P but SRE-specific in AncSR1 FACS-seq assays (×), we measured mean fluorescence of isogenic cultures by flow cytometry. We also assayed variants SRE-specific in AncSR1+11P and SRE-specific (square) or non-functional (open circle) in AncSR1, as validation controls. Isogenic mean fluorescence is represented as mean ± s.e.m. from three replicate transformations and inductions analysed via flow cytometry. All FACS-seq classifications were validated except for the three apparently restricted variants in AncSR1+11P (highlighted in red), which are in fact strong SRE-activators in this background. Each of these variants was predicted to be a strong SRE-binder on the basis of its genotype, but had an artificially low FACS-seq mean fluorescence estimate, perhaps because of a strong growth defect in inducing conditions. d, After removing the three genotypes with inaccurate FACS-seq fluorescence measurements (×), 11P-independent genotypes have significantly higher mean fluorescence than 11P-dependent genotypes in the AncSR1+11P background, consistent with a non-specific permissive mechanism via affinity. P values, Wilcoxon rank-sum test with continuity correction. e, The 11P substitutions do not alter the genetic determinants of SRE specificity. Each plot shows, for a variable site in the library, the frequency of every amino-acid state in two functionally defined sets of variants. Spearman’s ρ for each correlation is shown. The top row shows that the determinants of SRE specificity are similar in AncSR1 and AncSR1+11P libraries; the bottom row shows a much weaker relationship between the determinants of SRE and ERE specificity within the AncSR1+11P library. f, Biochemical determinants of ERE and SRE specificity in the AncSR1 (top) and AncSR1+11P (bottom) backgrounds. A multiple logistic regression model predicts the probability that a variant is response-element-specific from the biochemical properties of its amino-acid state at each of the four variable RH sites. The coefficients of this model represent the change in log-odds of being ERE-specific or SRE-specific per unit change in each property. Asterisks indicate site-specific determinants that differ significantly between ERE and SRE specificity in each background (Z test, P < 0.05).

Extended Data Table 1 Library sampling statistics

Full size table

Extended Data Table 2 Robustness of inferences to scheme for classification of variants

Full size table

Supplementary information

Reporting Summary (PDF 115 kb)

Supplementary Table

This file contains a list of oligonucleotide sequences used in this study. (XLSX 23 kb)

Supplementary Table

This file contains a list of all RH sequences and their specificity classifications in each protein background as used in this study. (XLSX 9310 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Starr, T., Picton, L. & Thornton, J. Alternative evolutionary histories in the sequence space of an ancient protein. Nature 549, 409–413 (2017). https://doi.org/10.1038/nature23902

Download citation

Received: 23 March 2017
Accepted: 08 August 2017
Published: 13 September 2017
Issue Date: 21 September 2017
DOI: https://doi.org/10.1038/nature23902

This article is cited by

Exploring the origin of a unique mutant allele in twin-tail goldfish using CRISPR/Cas9 mutants
- Shu-Hua Lee
- Chen-Yi Wang
- Kinya G. Ota
Scientific Reports (2024)
An Atlas of Variant Effects to understand the genome at nucleotide resolution
- Douglas M. Fowler
- David J. Adams
- Matthew E. Hurles
Genome Biology (2023)
Co-evolution of interacting proteins through non-contacting and non-specific mutations
- David Ding
- Anna G. Green
- Michael T. Laub
Nature Ecology & Evolution (2022)
Environmental selection and epistasis in an empirical phenotype–environment–fitness landscape
- J. Z. Chen
- D. M. Fowler
- N. Tokuriki
Nature Ecology & Evolution (2022)
Intragenic compensation through the lens of deep mutational scanning
- Nadezhda Azbukina
- Anastasia Zharikova
- Vasily Ramensky
Biophysical Reviews (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.