Evolution of new regulatory functions on biophysically realistic fitness landscapes

Friedlander, Tamar; Prizak, Roshan; Barton, Nicholas H.; Tkačik, Gašper

doi:10.1038/s41467-017-00238-8

Download PDF

Article
Open access
Published: 09 August 2017

Evolution of new regulatory functions on biophysically realistic fitness landscapes

Tamar Friedlander¹^na1^nAff2,
Roshan Prizak¹^na1,
Nicholas H. Barton¹ &
…
Gašper Tkačik ORCID: orcid.org/0000-0002-6699-1455¹

Nature Communications volume 8, Article number: 216 (2017) Cite this article

3294 Accesses
18 Citations
17 Altmetric
Metrics details

Subjects

Abstract

Gene expression is controlled by networks of regulatory proteins that interact specifically with external signals and DNA regulatory sequences. These interactions force the network components to co-evolve so as to continually maintain function. Yet, existing models of evolution mostly focus on isolated genetic elements. In contrast, we study the essential process by which regulatory networks grow: the duplication and subsequent specialization of network components. We synthesize a biophysical model of molecular interactions with the evolutionary framework to find the conditions and pathways by which new regulatory functions emerge. We show that specialization of new network components is usually slow, but can be drastically accelerated in the presence of regulatory crosstalk and mutations that promote promiscuous interactions between network components.

Competition and evolutionary selection among core regulatory motifs in gene expression control

Article Open access 13 December 2023

The evolution, evolvability and engineering of gene regulatory DNA

Article 09 March 2022

Robustness and innovation in synthetic genotype networks

Article Open access 28 April 2023

Introduction

Phenotypes evolve largely through changes in gene regulation^1,2,3,4, and such evolution may be flexible and rapid^{5, 6}. Of particular importance are mutations affecting affinity and specificity of transcription factors (TFs) for their upstream signals or for their binding sites, short fragments of DNA that TFs interact with to activate or repress transcription of specific target genes. Mutations in these binding sites or at sites that alter TF specificity are crucial because of their ability to “rewire” the regulatory network—to weaken or completely remove existing interactions and add new ones, either functional or spurious. Emergence of novel functions in such a network will usually be constrained to evolutionary trajectories that maintain a viable pattern of existing interactions. This raises a fundamental question about the effects of such constraints on the accessibility of different regulatory architectures and the timescales needed to reach them.

The case that we focus on here is the divergence of gene regulation, which can give rise to a variety of new phenotypes, e.g., via expansion in TF families. A regulatory function previously accomplished by a single (or several) TF(s) is now carried out by a larger number of TFs, allowing for additional fine-tuning and precision, or, alternatively, for an expansion of the regulatory scope^{7,8,9,10,11,12,13,14,15,16,17}. The main avenue for such expansions are gene duplications^18,19,20. Rapid weakening of expression of the duplicates²¹ or alternatively selection to increase expression^{22, 23} facilitate the preservation and fixation of duplicates. Gene duplications generate extra copies of the TFs and thus provide the “raw material” for evolutionary diversification. Subsequent specialization of TFs often involves divergence in both their inputs (e.g., ligands) and outputs (regulated genes)^{3, 24}. Examples range from repressors involved in bacterial carbon metabolism that arose from the same ancestor via a series of duplication–divergence events²⁵, and ancestral TF Lys14 in the metabolism of Saccharomyces cerevisiae, which diverged into three different TFs regulating different subsets of genes in Candida albicans ²⁶, to many variants of Lim and Pou-homeobox genes involved in neural development across different organisms²⁷. In some systems the ligand sensing and gene regulatory functions are distributed across two or more molecules, as for bacterial two-component pathways²⁸ and eukaryotic signaling cascades²⁹; here, too, specialization can occur by a series of mutations in multiple relevant components.

Immediately following a duplication event, molecular recognition between TFs, their input signals, and their binding sites is specific but undifferentiated between the two TF copies. Under selection to specialize, recognition sequences and ligand preferences of the two TFs can diverge, but only if some degree of matching between TFs and their binding sites is continually retained to ensure network function. Binding sites are thus forced to coevolve in tandem with the TFs. Yet little is known about the resulting limits to evolutionary outcomes, the likelihood of potential evolutionary trajectories, and the relevant timescales; specifically, it is unclear how these quantities of interest depend on important parameters, such as the number of regulated genes, the length and specificity of the binding sites, the correlations between the input signals, etc.

Theoretical understanding of TF duplication is still incomplete, with existing models predominantly belonging to two categories. The first category of gene duplication–differentiation models studies subfunctionalization of isolated proteins (e.g., enzymes) that do not have any regulatory role³⁰. When cis-regulatory mutations that control the expression of the duplicated gene are included^31,32,33,34, this is done in a simplified fashion, e.g., by a small number of discrete alleles that represent TF-binding sites appearing and disappearing at fixed rates^{33, 34}. Because this approach ignores the essentials of molecular recognition, it cannot model co-evolution between TFs and their binding sites—the topic of our interest.

The second category of studies tracks regulatory sequences explicitly and uses a biophysical description of TF–BS (binding site) interactions, properly accounting for the fact that TFs can bind a variety of DNA sequences with different affinities^35,36,37. In conjunction with thermodynamic models of gene regulation^38,39,40,41, this approach has been used to study the evolution of binding sites given a single TF^{37, 42,43,44,45}, while mostly overlooking the issue of TF duplication and subfunctionalization (but see refs ^{46, 47}).

Here we synthesize these two frameworks—the biophysical description of gene regulation and the evolutionary modeling of TF specialization—to construct a realistic description of the fundamental step by which regulatory networks have evolved. A biophysical model of this set-up gives rise to complex fitness landscapes that are markedly different from simple forms considered previously; in what follows, we show that realistic landscapes exert a major influence over the evolutionary outcomes and dynamics. The structure of the paper is as follows. We first introduce the basic model with two TFs and two regulated genes, and analyze its steady state distribution of outcomes, showing that the huge genotypic space maps to very few phenotypes. We next analyze the possible dynamical trajectories and timescales leading to each phenotype. Finally, we extend the basic model to a larger number of regulated genes and study the effect of “promiscuity-promoting mutations,” i.e., mutations that render TFs less specific for their binding sites.

Results

A biophysically realistic fitness landscape

In our model, n _TF transcription factors regulate n _G genes by binding to sites of length L base pairs; for simplicity, we consider each gene to have one such binding site. The specificity of a TF for any sequence is determined by the TF’s preferred (consensus) sequence; sequences matching consensus are assigned lowest energy, E = 0, which corresponds to tightest binding, and every mismatch between the consensus and the binding site increases the energy by $\epsilon $; this additive ‘mismatch’ model has a long history in gene regulation literature^{35, 43, 48, 49}.

The equilibrium probability that the binding site of gene j (j = 1,…,n _G) is bound by active TFs of any type i (i = 1,…,n _TF) is a proxy for the gene expression level and is given by the thermodynamic model of gene regulation^{38, 50}:

$${p_{jm}}\left( {\left\{ {{k_{ij}}} \right\},\left\{ {{C_i}(m)} \right\}} \right) = \frac{{{\sum}_i {{C_i}(m){e^{ - \epsilon {k_{ij}}}}} }}{{1 + {\sum}_i {{C_i}(m){e^{ - \epsilon {k_{ij}}}}} }},$$

(1)

where C _i(m) is dimensionless concentration of active TFs of type i in condition m, k _ij is the number of mismatches between the consensus sequence of the i-th TF species and the binding site of the j-th gene, and $\epsilon $ is the energy per mismatch in units of k _B T. Concentration C _i(m) of active TFs depends on condition m, which can represent either time or space (e.g., during developmental gene expression programs) or a discrete external environment (e.g., the presence/absence of particular chemical signals). The simplest case considered here assumes the existence of two such signals that can be either present or absent, in any combination, for a total number of four possible environments (m = 00, 01, 10, 11), occurring with probabilities α _m; an important parameter will be the correlation, −1 ≤ ρ ≤ 1, between the two signals. Each TF has two binary alleles, σ _i∈[00, 01, 10, 11], determining its specificity for the two signals. If the TF i is responsive to a signal and that signal is present in environment m, then its active concentration C _i(m) = C ₀; otherwise, C _i(m) = 0. Given constants C ₀, $\epsilon $, and the genotype ${\cal D}$—comprising TF consensus and binding site sequences as well as TF sensitivity alleles σ _i—the thermodynamic model of Eq. (1) fully specifies expression levels for all genes in all environments (Supplementary Note 1).

Figure 1a, b illustrates this set-up for a simple case n _TF = n _G = 2, assuming that the two copies of the TF emerged through an initial gene duplication event and are fixed in the population. The original TF regulates two downstream genes by binding to their binding sites. It is sensitive to both external signals, which can be present with a varying degree of correlation (Fig. 1a). After duplication, three types of mutation can occur, as shown in Fig. 1c: point mutations in the binding sites (rate μ), mutations in the TF coding sequence that change TF’s preferred (consensus) specificity (rate r _TF μ) and mutations in the two signal-sensing alleles (rate r _S μ), which can give each TF specificity to both signals, to one of them, or to neither. An example in Fig. 1d shows the state of the system after several mutations have affected the degree of (mis)match between the TFs and the binding sites, k _ij; an especially important quantity that tracks the overall divergence of the TF specificity is denoted as M, the match between the two TF consensus sequences.

To complete the evolutionary model, a fitness function is required. We assume selection for the genes to acquire distinct expression patterns in response to external signals, and thus define this fully specialized state as having the highest fitness in our model. Specifically, we penalize the deviations in actual gene expression, p _jm, from the ideal expression levels, $p_{jm}^*$:

$$F = - s \displaystyle {\sum_j} {\sum_m} \alpha_m \beta_{jm} {\left( p_{jm} - p_{jm}^{*} \right)}^2 ,$$

(2)

where the ideal expression level $p_{jm}^*$ is 1 (fully induced) for the first gene if signal 1 is present and the expression is 0 (not induced) otherwise, and similarly for the second gene; β _jm can be used to vary the relative weight of different errors (e.g. of a gene being uninduced when it should be induced and vice versa, see Supplementary Note 3), and s is the selection intensity. Importantly, selection does not directly depend on the TFs, but only on the expression state of the genes they regulate; genes, however, can only be expressed when TFs bind to proper binding sites, implicitly selecting on TFs. For this reason it is also very easy to generalize our model to regulation by repressor TFs, a case we explore in Supplementary Note 2.

We consider mutation rates to be low enough that a beneficial mutation fixes before another beneficial mutation arises⁵¹, allowing us to assume that the population is almost always fixed. The probability that the population occupies a particular genotypic state, $P({\cal D},t)$, evolves according to a continuous-time discrete-space Markov chain that specifies the rate of transition between any two genotypes. The transition rates are a product between the mutation rates between different states and the fixation probability that depends on the fitness advantage a mutant has over the ancestral genotypes^{43, 52}. The size of genotype space is high-dimensional but still tractable, because our model only requires us to keep track of mismatches and not full sequences, i.e., to write out the dynamical equations for the reduced-genotypes, ${\cal G} = \left\{ {M,{k_{ij}},{\sigma _i}} \right\}$. Standard Markov chain techniques can then be used to compute the evolutionary steady state, first hitting times to reach specific evolutionary outcomes, or to perform stochastic simulations (Supplementary Methods).

Figure 2 shows the interplay of biophysical constraints that give rise to a realistic fitness landscape for our problem. Given a match, M, between two TF consensus sequences, only certain combinations of mismatches, (k _1j,k _2j), of the TFs with each of the two binding sites are possible. A particular allowed combination can be realized by different numbers of genotypes, as shown in Fig. 2a, providing a detailed account of the entropy of the neutral distribution. For each of the four environments, Eq. (1) predicts gene expression at every pair of mismatch values (Fig. 2b); together with the probabilities of different environments occurring, the gene expression pattern determines the genotypes’ fitness, F. TF specialization then unfolds on this landscape by different types of mutations (e.g., Fig. 2c). Although the landscape is complex and high-dimensional, it is highly structured and ultimately fully specified by only a handful of biophysical parameters. Furthermore, because of the sigmoidal shape of binding probability as a function of mismatch k (Eq. (1)), it is possible to assign phenotypes of ‘strong’ and ‘weak’ binding to every TF–BS interaction, allowing us to depict network interactions graphically, as shown in Fig. 2d, and to classify the possible macroscopic evolutionary outcomes, as we will show next.

Evolutionary outcomes in steady state

Evolutionary outcomes in steady state are determined by a balance between selection and drift. The steady state distribution over reduced-genotypes is ⁵³

$${P_{{\rm{SS}}}}({\cal G}) = P({\cal G},t \to \infty ) = {P_0}({\cal G}){\rm{exp}}(2NF({\cal G})),$$

(3)

where P ₀ is the neutral distribution of genotypes and N is the population size. Eq. (3) is similar to the energy/entropy balance of statistical physics^{42, 54}, with fitness F playing the role of negative of energy and log P ₀ the role of entropy; in our model, both of these quantities are explicitly computable, as is the resulting steady state distribution.

Understanding the high dimensional distribution over genotypes is difficult, but classification of individual TF–BS interactions into “strong” and “weak” ones, as described above, allows us to systematically and uniquely assign every genotype to one of a few possible macroscopic outcomes, or “macrostates,” graphically depicted in Fig. 3a and defined precisely in Supplementary Note 1. Thus, in the ‘No Regulation’ state, input signals are not transduced to the target genes, either because TF–BS mismatches are high and there is no binding or because TFs themselves lose responsiveness to the input signals; in the ‘One TF Lost’ state, a single TF regulates both genes (as before duplication), while the other TF is lost, i.e., its specificity has diverged so far that it does not bind any of the sites; the ‘Specialize Binding’ state corresponds to each TF regulating its own gene without cross-regulating the other but the signal sensing domains are not yet signal specific, as they are in the ‘Specialize Both’, the state which we have defined to have the highest fitness. Finally, the ‘Partial’ macrostate predominantly features configurations where each of the TFs binds at least one binding site, but one of the TFs still binds both sites or retains responsiveness for both input signals; functionally, these configurations lead to large “crosstalk,” where input signals are non-selectively transmitted to both target genes.

Ultimately, these macrostates are the functional network phenotypes that we care about. The number of genotypes in each macrostate, however, can vary by orders of magnitude; for example, the ‘No Regulation’ state is larger by ∼10⁴ relative to the high-fitness ‘Specialize Both’ state, for our baseline choice of parameters (L = 5, $\epsilon $ = 3). Selection can act against this strong entropic bias, and the distribution of fitness values across genotypes within each macrostate is shown in Fig. 3b. Clearly, the mean or median fitness within each macrostate is a poor substitute for the detailed structure of fitness levels that depend nonlinearly on TF–BS mismatches and the degeneracy of the sequence space. Unlike the entropic term in Fig. 3b, fitness also depends on the statistics of the environment, α _m, and in particular, the correlation ρ between the two signals. For example, when the signals are strongly correlated, the ‘Initial’ state right after duplication or the ‘One TF Lost’ state can achieve quite high fitnesses, since responding to the wrong signal or having a high degree of crosstalk will still ensure largely appropriate gene expression pattern in all likely environments. In contrast, at strong negative correlation, many genotypes in ‘Specialize Binding’ and ‘Initial’ states will suffer a large fitness penalty because their sensing domains are not specialized for the correct signals, while the ‘Specialize Both’ state will have high fitness regardless of the environmental signal correlation.

How do fitness and entropy combine to determine macroscopic evolutionary outcomes? Fig. 3c shows the most probable macrostate as a function of selection strength and signal correlation (Supplementary Note 2). At weak selection, specific TF–BS interactions cannot be maintained against mutational entropy and the system settles into the most numerous, ‘No Regulation’ state. Higher selection strengths can maintain a limited number of TF–BS interactions in ‘Partial’ states. Beyond a threshold value for Ns, the evolutionary outcome depends on the signal correlation: when signals are anti-correlated or weakly correlated, the TFs reach the fully specialized state, whereas high positive correlation favors losing one TF and having the remaining TF regulate both genes and respond to both signals. As signal correlation increases, so does the selection strength required to support full specialization. Detailed insight at a fixed value of Ns is provided by plotting the free fitness $\hat F$, as in Fig. 3d, which combines the fitness and the entropy of the neutral distribution from Fig. 3b into a single quantity that determines the likelihood of each macrostate given ρ; the macrostate with highest free fitness is shown as the most probable outcome in Fig. 3c for Ns = 25, but free fitness also allows us to see, quantitatively, how much more likely the dominant macrostate is relative to other outcomes. Figure 3e examines the case where not only the correlation, ρ, but also the frequencies, f ₁, f ₂, of encountering both signals are varied: for low frequencies, even selection strength of Ns = 25 is insufficient to maintain TF specificity against drift, while for high frequencies and positive correlation one TF is lost while the remaining TF regulates both genes (Supplementary Note 2).

The map of evolutionary outcomes is very robust to parameter variations. The energy scale of TF–DNA interactions is that of hydrogen bonds: $\epsilon \sim 3$ (in k _B T units), consistent with direct measurements. The scale of C ₀ is set to ensure that consensus sites are occupied at saturation while fully mismatching sites are essentially empty. The only remaining important biophysical parameter is L, the length of the binding sites. As expected, increasing L expands the regions of ‘No Regulation’ and ‘Partial’ at low Ns, due to entropic effects. Surprisingly, however, one can demonstrate that the important boundary between the ‘Specialize’ and ‘One TF Lost’ states is independent of L; furthermore, the map in Fig. 3c is exactly robust to the overall rescaling of the mutation rate, μ, and even to separate rescaling of individual rates r _S, r _TF.

TFs can also act as repressors, whereby a gene is active unless a repressor binds its binding site and inhibits its expression. The analysis in that case is very similar to the activator case. The evolutionary outcomes differ only if the penalties β _jm are non-uniform. Specifically, we consider that unnecessary gene activation incurs a lower penalty β _jm than does failure to activate a gene when needed. Due to the scarcity of genotypes allowing for TF–BS binding compared to the abundance of genotypes for which no binding occurs, this effectively scales the selection pressure, such that higher selection pressure values Ns are required to obtain the more specialized macrostates ‘Partial’ followed by ‘Specialize Both’ (Supplementary Note 2).

We compare the steady-state marginal distributions of TF–BS mismatches and the match, M, between the two TFs, under strong selection to specialize (Ns = 25) vs neutral evolution (Ns = 0). Mismatch distributions for k ₁₁ and k ₂₁ in Fig. 3f display a clear difference in the two regimes: strong selection favors a small mismatch of the BS with the cognate TF, sufficient to ensure strong binding but nonzero due to entropy, and a large mismatch with the noncognate TF, to reduce crosstalk. Surprisingly, however, the distribution of matches M between two TF consensus sequences shows only a tiny signature of selection, with both distributions peaking around one match. As a consequence, inferring selection to specialize from measured binding preferences of real TFs might not be feasible with realistic amounts of data.

Evolutionary dynamics and fast pathways to specialization

Next, we focus on evolutionary trajectories and the timescales to reach the fully specialized state after gene duplication. An example trajectory is shown in Fig. 4a: the two TFs start off identical (with maximal match, M = L = 5) until, as a result of the loss of specificity for both signals, TF1 starts to drift, diverging from TF2 (sharply decreasing M in ‘One TF Lost’ state) and losing interactions with both binding sites. Subsequently TF1 reacquires preference to the red signal, which drives the reestablishment of TF1 specificity for one binding site during a short ‘Specialize Binding’ epoch, followed quickly by the specialization of TF2 for the green signal at the start of ‘Specialize Both’ epoch of maximal fitness.

Dynamics of the TF–TF match, M, and the scaled fitness, NF, become smooth and gradual when discrete transitions and the consequent large jumps in fitness are averaged over individual realizations, as in Fig. 4b. Importantly, we learn that the sequence of dominant macrostates leading towards the final (and steady) state, ‘Specialize Both’, involves a long intermediate epoch when the system is in the ‘One TF Lost’ state. We examine this sequence of most likely macrostates in detail in Fig. 4c, and visualize it analogously to the map of evolutionary outcomes in steady state shown in Fig. 3c. High Ns and correlation (ρ) values favor trajectories passing through the ‘One TF Lost’ state, while intermediate Ns (5 ≲ Ns ≲ 20) and low correlation values enable transitions through ‘Partial’ macrostate; along the latter trajectory, the binding of neither TF is completely abolished. Typical dwell times in dominant states, indicated as contours in Fig. 4c, suggest that specialization via the ‘One TF Lost’ state should be slower than through the ‘Partial’ state, which is best seen at t = 1/μ, where specialization has already occurred at intermediate Ns and low, but not high, ρ values.

It is easy to understand why pathways towards specialization via the ‘One TF Lost’ state are slow. As the example in Fig. 4a illustrates, so long as one TF maintains binding to both sites and thus network function (especially when signals are strongly correlated), the other TF’s specificity will be unconstrained to neutrally drift and lose binding to both sites, an outcome which is entropically highly favored. After the TF’s sensory domain specializes, however, the binding has to re-evolve essentially from scratch in a process that is known to be slow⁴⁵ unless selection strength is very high. In contrast to this ‘Slow’ pathway, the ‘Fast’ pathway via the ‘Partial’ state relies on sequential loss of “crosstalk” TF–BS interactions, with the divergence of TF consensus sequences followed in lock-step by mutations in cognate binding sites. Specifically, the likely intermediary of the fast pathway is a ‘Partial’ configuration in which the first TF responds to both signals but only regulates one gene, whereas the second TF is already specialized for one signal, but still regulates both genes.

The fast and the slow pathways are summarized in Fig. 4d. A detailed analysis (Supplementary Note 4) reveals how different biophysical and evolutionary parameters change the relative probability and the average duration (Fig. 4e) of both pathways. For example, increasing the length, L, of the binding sites favors the slow pathway as well as drastically increases its duration, leading to very slow evolutionary dynamics. In contrast, time to specialize via the fast pathway is unaffected by an increase in L. Increasing the rate of TF-specificity-affecting mutations, r _TF, has a qualitatively similar effect, while increasing the mutation rate affecting the sensory domain, r _S, favors the fast pathway. Indeed, in the limit when r _S is much larger than the other two mutation rates, the sensing domain specializes almost instantaneously, making the complete loss of binding by either TF very deleterious and thus avoiding the ‘One TF Lost’ state; the adaptation dynamics is initially rapid, with binding sites responding to diverging TF consensus sequences, and subsequently slow, when TF consensus sequences further minimize their match, M, in a nearly neutral process.

Promiscuity-promoting mutations

Typically, each TF must regulate more than one target gene. As the number of regulated genes per TF (n _G/n _TF) increases, intuition suggests that the evolution of the TF’s consensus sequence should become more and more constrained: while a mutation in an individual binding site can lower the total fitness by increasing mismatch and thereby impeding TF–BS binding, a single mutation in the TF’s consensus has the ability to simultaneously weaken the interaction with many binding sites, leading to a high fitness penalty. Our analysis of the biophysical fitness landscape confirmed that the landscape gets progressively more frustrated as the number of regulated genes per TF increases, due to the explosion of constraints that TFs have to satisfy to ensure the maintenance of functional regulation (Supplementary Note 5). Consequently, one can expect extremely long times to specialization. How can it nevertheless proceed at observable rates?

Energy matrices for many real TFs display ‘promiscuous’ specificity where, at a particular position within the binding site, binding to multiple nucleotides is equally preferable. We wondered how our findings would be affected if consensus sequence specificity of the TFs could pass through such intermediate promiscuous states. Figure 5a shows how TF consensus sequence and the corresponding binding site can co-evolve using point mutations, or using the new “promiscuity-promoting” mutation type for the TF: promiscuity-promoting mutation renders one position in the recognition sequence of the TF insensitive to the corresponding DNA base in the binding site (Supplementary Note 6). Evolutionary pressure on the binding sites is therefore temporarily relieved, until the specificity of the TF is re-established by a back mutation. Without promiscuity-promoting mutations, TF–BS co-evolution must proceed in a tight sequence of compensatory mutations; with promiscuity-promoting mutations, such a precise sequence is no longer required, although one extra mutation is needed to reestablish high TF–BS specificity. With promiscuity, the fraction of deleterious mutations along the evolutionary path towards specialization is reduced, an effect that grows stronger with increasing L. As shown in Fig. 5b, this has drastic effects on the time to specialization. Without promiscuity, increasing the selection strength, Ns, decreases the required time when each TF regulates one gene, as expected for a landscape with large neutral plateaus but with no fitness barriers. For n _G > 2, however, the landscape develops barriers that need to be crossed, and evolutionary time starts increasing with Ns. In contrast, promiscuity enables fast emergence of TF specialization even with multiple regulated genes in a broad range of evolutionary parameters (although there are also costs due to high promiscuity).

Discussion

The role that the shape of a fitness landscape plays for the dynamics and the final outcomes of evolution has been appreciated in population genetics for a long time. This has stimulated a large body of theoretical research into evolution on toy model landscapes^{55, 56}, as well as motivated efforts to map out real, small-scale landscapes experimentally. For limited classes of problems, mostly those involving molecular recognition, biophysical constraints are informative enough to permit computational exploration of complex landscapes. Such is the case for the secondary structure of RNA⁵⁷, antibody–antigen interactions⁵⁸, protein–protein interactions⁵⁹, and TF–DNA binding⁶⁰, explored here. We exploit this prior knowledge to construct a fitness landscape for a more complicated evolutionary event, the specialization of two TFs after duplication, a key evolutionary step by which gene regulatory networks expand. The biophysical model naturally captures a number of essential features, without having to introduce them ‘by hand’: the fact that specialization is driven by avoidance of regulatory crosstalk; the importance of the mutational entropy; the dependence on number of downstream genes; the existence of transient network configurations preceding specialization, which crucially impact dynamics; and the importance for evolutionary outcomes of the statistical properties of the signals that TFs respond to. Importantly, the expressive power of our framework does not come at increased modeling cost: while complex, the fitness landscape is still determined only by a few, mostly known, parameters, and an exponentially large space of genotypes can be systematically coarse grained to a small set of functional network phenotypes. This combination of biophysical and co-evolutionary approaches is applicable generally to the evolution of molecular interactions, e.g., in protein interaction networks.

In steady state, our results robustly identify correlation between the environmental signals that drive TFs as a key determinant for specialization, as shown in Fig. 3c–e. Unless the new signal, for which a post-duplication TF can specialize, is sufficiently independent (uncorrelated) from the existing signals that the regulatory network processes, one TF copy will be lost due to drift. As a consequence, the effective dimensionality of environmental signals dictates the complexity of genetic regulatory networks⁶¹, reminiscent of information-theoretic tradeoffs in sensory neuroscience⁶²; in evolutionary terms, selection to maintain complex regulation needs to withstand the mutational flux into vastly more numerous but less functional network phenotypes. Recently, it has been shown that finite biochemical specificity also limits the complexity of genetic regulatory networks⁶³; an interesting direction for future research is to understand how the balance between regulatory crosstalk, environmental signal statistics, and evolutionary constraints ultimately determines the number of TFs that can be stably maintained. A related question concerns the expected match between pairs of TFs in a large network as a signature of selection for specialized function; for an isolated pair of TFs, our results in Fig. 3f predict only a tiny deviation from neutrality.

Timescales and pathways to specialization are completely shaped by the properties of the biophysical fitness landscape, and thus cannot be captured by simple allelic models that ignore the topology of the sequence space (Supplementary Note 7). We show that the fast pathway to specialization transitions through ‘Partial’ states where neither of the two TFs completely loses binding. Interestingly, it is exactly the existence of crosstalk interactions that permits fast adaptation via these transient states, by maintaining the network function through one TF, while the other is free to diverge in a series of mutations to the TF and its future binding site⁶⁴. Crosstalk thus enables some amount of network plasticity during early adaptation, yet is ultimately selected against, when TFs become fully specialized^{65, 66}. In the protein–protein-interaction literature, ‘Partial’ states are sometimes referred to as promiscuous states, and they have been suggested as evolutionarily accessible intermediaries that relieve the two interacting molecules of the need to evolve in a tight (and likely very slow) series of compensatory mutations⁶⁷. In contrast to the fast pathway, the slow pathway involves a complete loss of TF–BS binding interactions; the long timescale emerges from long dwell times while the TF and the binding sites evolve in a nearly neutral landscape before TF–BS specificity is reacquired. Long-binding sites and (perhaps counter-intuitively) fast TF mutation rates favor the slow pathway, while fast sensing domain mutation rates favor the fast pathway.

The situation changes qualitatively when each TF regulates more genes⁶⁸. On the one hand, entropy makes pathways that pass through the ‘One TF Lost’ state dynamically uncompetitive, as multiple binding sites would have to emerge de novo to reestablish interactions with a diverged TF. This would favor fast pathways through ‘Partial’ states. On the other hand, the biophysical fitness landscape develops frustration (or sign epistasis) as n _G > 2 and the timescales to specialization lengthen with increasing selection strength when passing through ‘Partial’ states. We demonstrate that frustration is relieved by promiscuity-promoting mutations in the TF, enabling fast emergence of specialization even with multiple-regulated genes.

Recent experimental works have demonstrated how a combination of cis and trans mutations can rewire gene regulatory networks allowing for the emergence of new functions via transient and promiscuous configurations, in accordance with our model¹⁵. While we focused on a specific evolutionary scenario involving TF duplication, gene regulatory networks can rewire in numerous other ways. For example, Sayou et al. studied the evolution of TF–DNA binding specificity while the TF remains present in a single copy¹⁴. Duplicated TFs can also be re-used in ways that are different from what we considered²⁶. Our results do, however, make predictions for expected timescales to reach different network configurations after gene duplication, which can be compared to bioinformatic data; alternatively, genomic data on TF duplication events could be used to infer selection pressures favoring regulatory divergence.

Taken together, our results paint a picture of TF specialization that most likely proceeds through intermediate states with high crosstalk, in which one TF has already specialized for its input signals but not yet for the target genes, while the other TF is not yet specialized for the input signals but only regulates one gene. In addition, these intermediate states are likely to be more promiscuous, binding different sites with the same affinity, with the promiscuity reverting to specific binding towards the end of specialization. This picture is qualitatively different from the paradigmatic idea of a simple and sequential progression of compensatory mutations in the TF and its binding sites^{46, 69}. It depends fundamentally on the biophysical model of TF–BS interactions, predicts significantly faster specialization times, as well as the existence of promiscuous TF variants that are starting to be observed in genomic analyses of duplication-specialization events^{14, 15}.

Methods

We consider mutation rates to be low enough that a beneficial mutation fixes before another beneficial mutation arises⁵¹, allowing us to assume that the population is almost always captured by a single genotype. The probability that the population occupies a particular genotypic state, $P({\cal D},t)$, evolves according to a continuous-time discrete-space Markov chain. Transition rates between states are a product between the mutation rates between different genotypes and the fixation probabilities that depend on the fitness advantage a mutant has over the ancestral genotype^{43, 52}, r _xy = 2Nμ _xyΦ_y→x, where N is the population size, μ _xy is the mutation rate from genotype y to x, and Φ_y→x is the probability of fixation of a single copy of x in a population of y. Our model only requires us to keep track of mismatches and not full sequences (i.e., the reduced-genotypes, ${\cal G} = \left\{ {M,{k_{ij}},{\sigma _i}} \right\}$), which significantly reduces the genotype space dimensionality. This framework allows for calculation of the steady state distribution of genotypes, or reduced-genotypes Eq. (3) and classification of genotypes into relevant macrostates.

To calculate the neutral distribution P ₀ of the reduced-genotypes (distribution in the absence of selection), we enumerate the number of possible BS sequences j that have mismatch values (k _1j,k _2j) with respect to two TFs that match each other at M out of L consensus positions:

$${N_{{\kern 1pt} {\rm{seq}}{\kern 1pt} }}\left( {{k_1},{k_2}{\rm{|}}M} \right) \hskip -2pt= \mathop {\sum}\limits_{{j_0} = j_0^{{\rm{min}}}}^{j_0^{{\rm{max}}}} {\left( {\begin{array}{*{20}{c}} M \\ {{J_0}} \\ \end{array}} \right){3^{M - {j_0}}}\left( {\begin{array}{*{20}{c}} {L - M} \\ {L - {j_0} - {k_1}} \\ \end{array}} \right)\left( {\begin{array}{*{20}{c}} {{j_0} + {k_1} - M} \\ {L - {j_0} - {k_2}} \\ \end{array}} \right){2^{{k_1} + {k_2} + 2{j_0} - L - M}}} \\ j_{\rm{0}}^{{\rm{min}}} \hskip -3pt= {\rm{max}}\left( {{\rm{max}}\left( {0,M - {\rm{min}}\left( {{k_1},{k_2}} \right)} \right),\left\lceil {\frac{{L + M - {k_1} - {k_2}}}{2}} \right\rceil } \right) \hskip3.6pc\\ j_0^{{\rm{max}}} = {\rm{min}}\left( {M,L - {\rm{max}}\left( {{k_1},{k_2}} \right)} \right) \hskip13.3pc$$

(4)

The neutral distribution (up to proportionality constant) equals

$${P_0}(x) \sim {N_{{\rm{seq}}}}\left( {{k_{11}},{k_{21}}{\rm{|}}M} \right){N_{{\rm{seq}}}}\left( {{k_{12}},{k_{22}}{\rm{|}}M} \right)\left( {\begin{array}{*{20}{c}} L \\ M \\ \end{array}} \right){3^{L - M}}.$$

(5)

We iterate this calculation for various parameter combinations (Ns, ρ, f _1,2). For each, we determine the most probable macrostate at steady state (Supplementary Note 2) as illustrated in Fig. 3.

To determine evolutionary dynamics we numerically integrate $P({\cal G},t)$ in time-steps corresponding to one generation t _g:

$$P\left( {{\cal G},t + {t_g}} \right) = P({\cal G},t) + {\bf{R}}{t_g}P({\cal G},t),$$

(6)

where R is the Markov chain transition matrix. Again, at every time-point we determine the most probable macrostate (Supplementary Note 2), as illustrated in Fig. 4.

To follow different pathways to specialization and the timescale to reach each, we calculate mean first hitting time T _S←x from any reduced-genotype x, to a subset of reduced-genotypes S, by using the recursive equation

$${T_{S \leftarrow x}} = {t_g} + \mathop {\sum}\limits_y {{a_{yx}}{T_{S \leftarrow y}}} ,$$

(7)

where a _yx are the elements of the transition probability matrix A = I + R t _g. In particular, we consider subsets S _z of genotypes that belong to a particular macrostate z, and compute the mean first hitting times, ${T_{{S_z} \leftarrow x}}$, to this macrostate. Time to specialization, τ, is the time to reach ‘Specialize Both’ macrostate. We also calculate “dwell times”, t ^dwell(z) by using a similar procedure. Dwell time in a particular macrostate z, is the mean (taken over all the genotypes in z, S _z) first hitting time to any other macrostate, starting from S _z.

We supplement these analytical solutions by stochastic simulations. Using Gillespie algorithm⁷⁰, we draw random times in which substitutions between distinct (reduced-)genotypes occurred. At each simulation run a we generate a specific evolutionary trajectory. By repeating this procedure numerous times, we obtain statistics over the distributions and evolutionary pathways. We use stochastic simulations to either validate the analytical calculations or substitute them when they are hard. That is the case, for example, for calculation of mean hitting time to a particular macrostate conditioned on not hitting another macrostate before, as in {τ _fast} and {τ _slow} (Fig. 4). More details about the methods are given in Supplementary Methods.

Data availability

The authors declare that all data supporting the findings of this study are available within the article and its Supplementary Information file.

References

King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).
Article ADS CAS PubMed Google Scholar
Gilad, Y., Oshlack, A., Smyth, G. K., Speed, T. P. & White, K. P. Expression profiling in primates reveals a rapid evolution of human transcription factors. Nature 440, 242–245 (2006).
Article ADS CAS PubMed Google Scholar
Wray, G. A. The evolutionary significance of cis-regulatory mutations. Nat. Rev. Genet. 8, 206–216 (2007).
Article CAS PubMed Google Scholar
Carroll, S. B. Evolution at two levels: on genes and form. PLoS Biol. 3, e245 (2005).
Article PubMed PubMed Central Google Scholar
Yona, A. H., Frumkin, I. & Pilpel, Y. A relay race on the evolutionary adaptation spectrum. Cell 163, 549–559 (2015).
Article CAS PubMed Google Scholar
Babu, M. M., Teichmann, S. A. & Aravind, L. Evolutionary dynamics of prokaryotic transcriptional regulatory networks. J. Mol. Biol. 358, 614–633 (2006).
Article Google Scholar
Kacser, H. & Beeby, R. Evolution of catalytic proteins: on the origin of enzyme species by means of natural selection. J. Mol. Evol. 20, 38–51 (1984).
Article ADS CAS PubMed Google Scholar
Simionato, E. et al. Origin and diversification of the basic helix-loop-helix gene family in metazoans: insights from comparative genomics. BMC Evol. Biol. 7, 33 (2007).
Article PubMed PubMed Central Google Scholar
Larroux, C. et al. Genesis and expansion of metazoan transcription factor gene classes. Mol. Biol. Evol. 25, 980–996 (2008).
Article CAS PubMed Google Scholar
Hobert, O., Carrera, I. & Stefanakis, N. The molecular and gene regulatory signature of a neuron. Trends Neurosci. 33, 435–445 (2010).
Article CAS PubMed PubMed Central Google Scholar
Achim, K. & Arendt, D. Structural evolution of cell types by step-wise assembly of cellular modules. Curr. Opin. Genet. Dev. 27, 102–108 (2014).
Article CAS PubMed Google Scholar
McKeown, A. N. et al. Evolution of DNA specificity in a transcription factor family produced a new gene regulatory module. Cell 159, 58–68 (2014).
Article CAS PubMed PubMed Central Google Scholar
Baker, C. R., Tuch, B. B. & Johnson, A. D. Extensive DNA-binding specificity divergence of a conserved transcription regulator. Proc. Natl Acad. Sci. USA 108, 7493–7498 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Sayou, C. et al. A promiscuous intermediate underlies the evolution of LEAFY DNA binding specificity. Science 343, 645–648 (2014).
Article ADS CAS PubMed Google Scholar
Pougach, K. et al. Duplication of a promiscuous transcription factor drives the emergence of a new regulatory network. Nat. Commun. 5, 4868 (2014).
Article CAS PubMed PubMed Central Google Scholar
Nadimpalli, S., Persikov, A. V. & Singh, M. Pervasive variation of transcription factor orthologs contributes to regulatory network evolution. PLoS Genet. 11, e1005011 (2015).
Article PubMed PubMed Central Google Scholar
Arendt, D. The evolution of cell types in animals: emerging principles from molecular studies. Nat. Rev. Genet. 9, 868–882 (2008).
Article CAS PubMed Google Scholar
Ohno, S. Evolution by Gene Duplication (Springer-Verlag, 1970).
Magadum, S., Banerjee, U., Murugan, P., Gangapur, D. & Ravikesavan, R. Gene duplication as a major force in evolution. J. Genet. 92, 155–161 (2013).
Article PubMed Google Scholar
Yona, A. H. et al. Chromosomal duplication is a transient evolutionary solution to stress. Proc. Natl Acad. Sci. USA 109, 21010–21015 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Lan, X. & Pritchard, J. K. Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals. Science 352, 1009–1013 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Conant, G. C., Birchler, J. A. & Pires, J. C. Dosage, duplication, and diploidization: clarifying the interplay of multiple models for duplicate gene evolution over time. Curr. Opin. Plant Biol. 19, 91–98 (2014).
Article CAS PubMed Google Scholar
Loehlin, D. W. & Carroll, S. B. Expression of tandem gene duplicates is often greater than twofold. Proc. Natl Acad Sci. USA 113, 5988–5992 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Wittkopp, P. J. & Kalay, G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2012).
Article CAS Google Scholar
Nguyen, C. C. & Saier, M. H. Phylogenetic, structural and functional analyses of the LacI-GalR family of bacterial transcription factors. FEBS Lett. 377, 98–102 (1995).
Article CAS PubMed Google Scholar
Pérez, J. C. et al. How duplicated transcription regulators can diversify to govern the expression of nonoverlapping sets of genes. Genes Dev. 28, 1272–1277 (2014).
Article PubMed PubMed Central Google Scholar
Hobert, O. & Westphal, H. Functions of LIM-homeobox genes. Trends Genet. 16, 75–83 (2000).
Article CAS PubMed Google Scholar
Parkinson, J. S. Signal transduction schemes of bacteria. Cell 73, 857–871 (1993).
Article CAS PubMed Google Scholar
Bowler, C. & Chua, N. H. Emerging themes of plant signal transduction. Plant Cell 6, 1529–1541 (1994).
Article CAS PubMed PubMed Central Google Scholar
Innan, H. & Kondrashov, F. The evolution of gene duplications: classifying and distinguishing between models. Nat. Rev. Genet. 11, 97–108 (2010).
Article CAS PubMed Google Scholar
Force, A. et al. Preservation of duplicate genes by complementary, degenerative mutations. Genetics 151, 1531–1545 (1999).
CAS PubMed PubMed Central Google Scholar
Lynch, M. & Force, A. The probability of duplicate gene preservation by subfunctionalization. Genetics 154, 459–473 (2000).
CAS PubMed PubMed Central Google Scholar
Force, A. et al. The origin of subfunctions and modular gene regulation. Genetics 170, 433–446 (2005).
Article CAS PubMed PubMed Central Google Scholar
Proulx, S. R. Multiple routes to subfunctionalization and gene duplicate specialization. Genetics 190, 737–751 (2012).
Article CAS PubMed PubMed Central Google Scholar
Maerkl, S. J. & Quake, S. R. A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007).
Article ADS CAS PubMed Google Scholar
Wunderlich, Z. & Mirny, L. A. Different gene regulation strategies revealed by analysis of binding motifs. Trends Genet. 25, 434–440 (2009).
Article CAS PubMed PubMed Central Google Scholar
Payne, J. L. & Wagner, A. The robustness and evolvability of transcription factor binding sites. Science 343, 875–877 (2014).
Article ADS CAS PubMed Google Scholar
Shea, M. A. & Ackers, G. K. The OR control system of bacteriophage lambda: a physical-chemical model for gene regulation. J. Mol. Biol. 181, 211–230 (1985).
Article CAS PubMed Google Scholar
Kinney, J. B., Murugan, A., Callan, C. G. & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA 107, 9158–9163 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Sherman, M. S. & Cohen, B. A. Thermodynamic state ensemble models of cis-regulation. PLoS Comput. Biol. 8, e1002407 (2012).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
He, X., Samee, M. A. H., Blatti, C. & Sinha, S. Thermodynamics-based models of transcriptional regulation by enhancers: the roles of synergistic activation, cooperative binding and short-range repression. PLoS Comput. Biol. 6, e1000935 (2010).
Article ADS PubMed Central Google Scholar
Berg, J., Willmann, S. & Lässig, M. Adaptive evolution of transcription factor binding sites. BMC Evol. Biol. 4, 42 (2004).
Article PubMed PubMed Central Google Scholar
Lässig, M. From biophysics to evolutionary genetics: statistical aspects of gene regulation. BMC Bioinformatics 8, 1–21 (2007).
Article Google Scholar
Lynch, M. & Hagner, K. Evolutionary meandering of intermolecular interactions along the drift barrier. Proc. Natl Acad. Sci. USA 112, E30–E38 (2015).
Article ADS CAS PubMed Google Scholar
Tuğrul, M., Paixão, T., Barton, N. H. & Tkačik, G. Dynamics of transcription factor binding site evolution. PLoS Genet. 11, e1005639 (2015).
Article PubMed PubMed Central Google Scholar
Poelwijk, F. J., Kiviet, D. J. & Tans, S. J. Evolutionary potential of a duplicated repressor-operator pair: simulating pathways using mutation data. PLoS Comput. Biol. 2, e58 (2006).
Article ADS PubMed PubMed Central Google Scholar
Burda, Z., Krzywicki, A., Martin, O. C. & Zagorski, M. Distribution of essential interactions in model gene regulatory networks under mutation-selection balance. Phys. Rev. E 82, 011908 (2010).
Article ADS CAS Google Scholar
Von Hippel, P. H. & Berg, O. G. On the specificity of DNA-protein interactions. Proc. Natl Acad. Sci. USA 83, 1608 (1986).
Article ADS Google Scholar
Gerland, U., Moroz, J. D. & Hwa, T. Physical constraints and functional characteristics of transcription factor-dna interaction. Proc. Natl Acad. Sci. USA 99, 12015–12020 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
Bintu, L. et al. Transcriptional regulation by the numbers: models. Curr. Opin. Genet. Dev. 15, 116–124 (2005).
Article CAS PubMed PubMed Central Google Scholar
Desai, M. M. & Fisher, D. S. Beneficial mutation-selection balance and the effect of linkage on positive selection. Genetics 176, 1759–1798 (2007).
Article PubMed PubMed Central Google Scholar
Kimura, M. On the probability of fixation of mutant genes in a population. Genetics 47, 713–719 (1962).
CAS PubMed PubMed Central Google Scholar
Gillespie, J. H. Population Genetics: A Concise Guide, 2nd edn (The Johns Hopkins University Press, 2004).
Sella, G. & Hirsh, A. E. The application of statistical physics to evolutionary biology. Proc. Natl Acad. Sci. USA 102, 9541–9546 (2005).
Article ADS CAS PubMed PubMed Central Google Scholar
Kauffman, S. & Levin, S. Towards a general theory of adaptive walks on rugged landscapes. J. Theor. Biol. 128, 11–45 (1987).
Article MathSciNet CAS PubMed Google Scholar
Kryazhimskiy, S., Tkačik, G. & Plotkin, J. B. The dynamics of adaptation on correlated fitness landscapes. Proc. Natl Acad. Sci. USA 106, 18638–18643 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Schuster, P., Fontana, W., Stadler, P. F. & Hofacker, I. L. From sequences to shapes and back: a case study in RNA secondary structures. Proc. R. Soc. Lond. B: Biol. Sci. 255, 279–284 (1994).
Article ADS CAS Google Scholar
Adams, R. M., Mora, T., Walczak, A. M. & Kinney, J. B. Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves. eLife 5, e23156 (2016).
Article PubMed PubMed Central Google Scholar
Podgornaia, A. I. & Laub, M. T. Pervasive degeneracy and epistasis in a protein-protein interface. Science 347, 673–677 (2015).
Article ADS CAS PubMed Google Scholar
Aguilar-Rodrguez, J., Payne, J. L. & Wagner, A. A thousand empirical adaptive landscapes and their navigability. Nat. Ecol. Evol. 1, 0045 (2017).
Article Google Scholar
Friedlander, T., Mayo, A. E., Tlusty, T. & Alon, U. Evolution of bow-tie architectures in biology. PLoS Comput. Biol. 11, e1004055 (2015).
Article ADS PubMed PubMed Central Google Scholar
Tkačik, G., Prentice, J. S., Balasubramanian, V. & Schneidman, E. Optimal population coding by noisy spiking neurons. Proc. Natl Acad. Sci. USA 107, 14419–14424 (2010).
Article ADS PubMed PubMed Central Google Scholar
Friedlander, T., Prizak, R., Guet, C. C., Barton, N. H. & Tkačik, G. Intrinsic limits to gene regulation by global crosstalk. Nat. Commun. 7, 12307 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Shultzaberger, R. K., Maerkl, S. J., Kirsch, J. F. & Eisen, M. B. Probing the informational and regulatory plasticity of a transcription factor DNA-binding domain. PLoS Genet. 8, e1002614 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rowland, M. A. & Deeds, E. J. Crosstalk and the evolution of specificity in two-component signaling. Proc. Natl Acad. Sci. USA 111, 5550–5555 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Eldar, A. Social conflict drives the evolutionary divergence of quorum sensing. Proc. Natl Acad. Sci. USA 108, 13635–13640 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Aakre, C. D. et al. Evolving new protein-protein interaction specificity through promiscuous intermediates. Cell 163, 594–606 (2015).
Article CAS PubMed PubMed Central Google Scholar
Sengupta, A. M., Djordjevic, M. & Shraiman, B. I. Specificity and robustness in transcription control networks. Proc. Natl Acad. Sci. USA 99, 2072–2077 (2002).
Article ADS CAS PubMed PubMed Central Google Scholar
de Vos, M. G. J., Dawid, A., Sunderlikova, V. & Tans, S. J. Breaking evolutionary constraint with a tradeoff ratchet. Proc. Natl Acad. Sci. USA 112, 14906–14911 (2015).
Article ADS PubMed PubMed Central Google Scholar
Gillespie, D. T. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comput. Phys. 22, 403–434 (1976).
Article ADS MathSciNet CAS Google Scholar

Download references

Acknowledgements

We thank the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme (FP7/2007–2013) under REA grant agreement Nr. 291734 (T.F.), ERC grant Nr. 250152 (N.B.), and Austrian Science Fund grant FWF P28844 (G.T.).

Author information

Tamar Friedlander
Present address: The Robert H. Smith Institute of Plant Sciences and Genetics in Agriculture, Faculty of Agriculture Hebrew University of Jerusalem, P.O. Box 12, Rehovot, 7610001, Israel
Tamar Friedlander and Roshan Prizak contributed equally to this work.

Authors and Affiliations

Institute of Science and Technology Austria, Am Campus 1, A-3400, Klosterneuburg, Austria
Tamar Friedlander, Roshan Prizak, Nicholas H. Barton & Gašper Tkačik

Authors

Tamar Friedlander
View author publications
You can also search for this author in PubMed Google Scholar
Roshan Prizak
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas H. Barton
View author publications
You can also search for this author in PubMed Google Scholar
Gašper Tkačik
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.F., R.P., N.H.B., and G.T. designed the study. T.F. and R.P. carried out the calculations and analysis. T.F. and G.T. wrote the paper.

Corresponding author

Correspondence to Gašper Tkačik.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Supplementary Information

Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Friedlander, T., Prizak, R., Barton, N.H. et al. Evolution of new regulatory functions on biophysically realistic fitness landscapes. Nat Commun 8, 216 (2017). https://doi.org/10.1038/s41467-017-00238-8

Download citation

Received: 15 January 2017
Accepted: 13 June 2017
Published: 09 August 2017
DOI: https://doi.org/10.1038/s41467-017-00238-8

This article is cited by

The role of promiscuous molecular recognition in the evolution of RNase-based self-incompatibility in plants
- Keren Erez
- Amit Jangid
- Tamar Friedlander
Nature Communications (2024)
Robustness and innovation in synthetic genotype networks
- Javier Santos-Moreno
- Eve Tasiudi
- Yolanda Schaerli
Nature Communications (2023)
Asymmetry between Activators and Deactivators in Functional Protein Networks
- Ammar Tareen
- Ned S. Wingreen
- Ranjan Mukhopadhyay
Scientific Reports (2020)
Survival of the simplest in microbial evolution
- Torsten Held
- Daniel Klemmer
- Michael Lässig
Nature Communications (2019)
Towards a Dynamic Interaction Network of Life to unify and expand the evolutionary theory
- Eric Bapteste
- Philippe Huneman
BMC Biology (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.