Modeling genome-wide enzyme evolution predicts strong epistasis underlying catalytic turnover rates

Systems biology describes cellular phenotypes as properties that emerge from the complex interactions of individual system components. Little is known about how these interactions have affected the evolution of metabolic enzymes. Here, we combine genome-scale metabolic modeling with population genetics models to simulate the evolution of enzyme turnover numbers (kcats) from a theoretical ancestor with inefficient enzymes. This systems view of biochemical evolution reveals strong epistatic interactions between metabolic genes that shape evolutionary trajectories and influence the magnitude of evolved kcats. Diminishing returns epistasis prevents enzymes from developing higher kcats in all reactions and keeps the organism far from the potential fitness optimum. Multifunctional enzymes cause synergistic epistasis that slows down adaptation. The resulting fitness landscape allows kcat evolution to be convergent. Predicted kcat parameters show a significant correlation with experimental data, validating our modeling approach. Our analysis reveals how evolutionary forces shape modern kcats and the whole of metabolism.


Supplementary
1 3-deoxy-D-manno-octulosonic acid transferase b3633 Supplementary Table 2: Sensitivity of correlation between experimental data and model predictions against identity and size of the unconstrained (evolving) set of reactions. A random set of unconstrained reactions of size 373 (equal to the size used for simulation presented in Figure 4), 750, and 1000 was set. The average evolved kcat vector across six replicates per set size was compared to experimental data. Aerobic growth on glucose was used as growth condition. Reactions unchanged after simulation under this constant condition were removed when evolved kcat was below 10 -2 . Note that the number of comparable observations are significantly lower than in the original analysis presented in Figure 4 because of the random nature of the evolving reaction set. The p-values were calculated as described in in the Methods section.   As fixation probabilities diminish, MCMC becomes computationally infeasible for tracing the later stages of evolution. In order to investigate the effect of additional mutations computationally, a greedy simulation approach is used. At each iteration, we compute the selection coefficient s for an individual doubling (i.e., α=2) of all unconstrained kcats. The "mutation" that yields the highest s is fixed and the algorithm is repeated. Two thousand of those fixation events are shown for each replicate. The black line indicates the theoretical limit for s below which mutations behave effectively neutral. The final growth rates after iteration 2000 are: 0.492 h -1 , 0.496 h -1 , and 0.497 h -1 , for replicate 1, 2, and 3, respectively.
Supplementary Figure 6: Constraining multi-functional genes abolishes jump dynamics. All reactions catalyzed by genes that were associated with "jumps" in adaptation trajectories (as listed in Supplementary Table 1)  To determine the effect of the magnitude of constrained kcats on end point growth rates, we simulate three replicates each for the original case (all constrained kcats set to 13.7s -1 ) and for two reduced cases (1.37s -1 and 0.137s -1 ). To speed up convergence and reduce variance across replicates we removed multifunctional reactions as explained in Supplementary Figure 6. We find that the kcat of the constrained fraction is a major determinant of the final growth rate.  Figure 2. The convergence of replicates in terms of growth rate and kcat suggests a smooth, single-peaked phenotypic fitness landscape. In order to investigate the structure of the fitness landscape between simulated end-points, we average kcat vectors between pairs of replicates and compute the corresponding growth rate. This "hybridization" indicates that no "fitness valleys" exists between end points, supporting the idea of a single-peaked landscape.
Supplementary Figure 13: Comparison between experimental data and end point kcat predictions from simulations where initial states are sampled from an empirical distribution. Initial kcats of five replicates were drawn from a log-normal distribution with log-scaled mean 19.12s -1 and log-scaled standard deviation 18.88s -1 , as determined from the in vitro data in 2 that could be mapped to the model. To avoid numerical problems in the simulation, values were capped below 1e-3 and above 1e5. Lower correlation than in the original simulations ( Figure 4) and over-estimation of experimental data is found. The likely cause for this decreased agreement with experimental data are unrealistically high kcats in initial states, that then have a low probability of deterioration because, even in un-used reactions, decreases in kcats act at best as neutral mutations. Horizontal error bars in show the standard deviation across five simulated replicates. The p-values were calculated as described in in the Methods section.
Supplementary Figure 14: Comparison of end point kcat vectors when initial states are sampled from an empirical distribution (see Supplementary Figure 12 for details).
Supplementary Figure 15: Random perturbation of network stoichiometry and biomass equation abolishes correlation with experimental data. In order to test the effect of network structure and flux distribution on the ability of the model to explain experimental data, we replace stoichiometric coefficients in the metabolic model iJO1366 with integers drawn from a binomial distribution with five trials of success probability ½, where the original sign of the coefficients remains intact. Furthermore, we perturb the flux distribution of the model by replacing the original biomass function of the model with a random sample of consumed components of the same cardinality as the original biomass function (68 metabolites), and with coefficients mimicking the distribution of the original biomass function of the model (normal distribution with mean 1.60 and standard deviation 8.74). This process was repeated until a model with feasible growth was found. Protein molecular weights were set to the median of the model enzymes (=44kDa). kcat evolution was simulated analogously to that presented in Figure 2, i.e., 5e7 mutations for aerobic growth on glucose were simulated for five replicates with different reaction stoichiometries and biomass functions. Points show averages over these replicates, with error bars showing standard deviations in case more than one replicate showed significant evolution for that reaction. The p-values were calculated as described in in the Methods section. The genome-scale simulation assumes optimal gene expression patterns during the adaptation process.

Supplementary
We can impose this assumption by applying a uniform distribution of activities as Furthermore, a proteome constraint C [mol gDW-1] is applied as The value of kcat1 [s-1] after n mutations of equal multiplicative effect α is where k 0 cat1 represents the ancestral turnover number.
Combining these equations and re-arranging yields (1) As kcat2 is constant, E2 is proportional to the systems fitness. When mutations acting on kcat1 are beneficial (α>1), the development of the evolving system with increasing number of mutations takes a sigmoidal shape (Supplementary Fig. 17).
Supplementary Figure  Also note that with increasing improvement of kcat1 (α>1), the proteome investment into the biophysically constrained second reaction approaches the total proteome C: Because of the finite population size in real organisms, E2 will not approach C indefinitely; when the fitness gain is too low dynamics become dominated by random drift 3,4 .

A model of additive mutations
We further analyze a similar model, where mutations act on kcat1 in an additive manner; This leads to the following expression for E2: Again, this additive model shows saturation and diminishing returns ( Supplementary Fig. 18).

The effect of the number of evolving reactions
To extend the multiplicative model with multiple reactions that act in a linear pathway, we make the simplifying assumption that p evolving reactions share a common kcat, k 0 cat1, while the constrained set of q reactions shares kcat2: Eevolving denotes the proteome fraction in evolving enzymes, while Econstrained represents the biophysically constrained fraction. Uniform distribution of enzyme activity is applied as: The protein investment in the constrained fraction is then given by: (3) As expected, diminishing returns are still occurring, and the total achievable flux of the pathway decreases with the number of constrained reactions ( Supplementary Fig. 19).
Supplementary This confirms our intuition: the achievable fitness decreases as the fraction of constrained reactions grows. Supplementary Figure 3.

Determinants of the selection coefficient s
We next determine the selection coefficient s for a novel mutation that changes the kcat of one reaction of the evolving set by a factor α.