Pervasive cooperative mutational effects on multiple catalytic enzyme traits emerge via long-range conformational dynamics

Multidimensional fitness landscapes provide insights into the molecular basis of laboratory and natural evolution. To date, such efforts usually focus on limited protein families and a single enzyme trait, with little concern about the relationship between protein epistasis and conformational dynamics. Here, we report a multiparametric fitness landscape for a cytochrome P450 monooxygenase that was engineered for the regio- and stereoselective hydroxylation of a steroid. We develop a computational program to automatically quantify non-additive effects among all possible mutational pathways, finding pervasive cooperative signs and magnitude epistasis on multiple catalytic traits. By using quantum mechanics and molecular dynamics simulations, we show that these effects are modulated by long-range interactions in loops, helices and β-strands that gate the substrate access channel allowing for optimal catalysis. Our work highlights the importance of conformational dynamics on epistasis in an enzyme involved in secondary metabolism and offers insights for engineering P450s.


Introduction
Directed evolution constitutes a powerful tool for optimizing protein properties including activity, substrate scope, selectivity, stability, allostery or binding affinity. By applying iterative rounds of gene mutagenesis, expression and screening (or selection), proteins have been engineered for developing more efficient industrial biocatalytic processes [1][2][3][4] . Directed evolution has also provided important insights into the relationship between protein sequence and function [4][5][6] , yet understanding the intricacies of non-additive epistatic effects remains a challenge 7 . Epistasis means that the phenotypic consequences of a mutation depend on the genetic background [8][9][10][11] . Epistatic effects can be negative 3 (antagonistic/deleterious) or positive (synergistic/cooperative) if the respective predictive value is smaller or greater in sign/magnitude than the expected value under additivity.
Based on studies of natural and laboratory protein evolution, negative 10 or positive 11 epistasis is more widespread than originally thought 7 . Importantly, positive epistasis increases the evolution of new protein functions because it allows access to mutational pathways that avoid deleterious downfalls. For fundamental and practical reasons, it is thus important to determine the existence, type and molecular basis of epistasis in protein evolution. Epistatic effects can arise between residues that are located closely or away from each other via long-range indirect interactions, both mechanisms involving sometimes direct or indirect substrate binding 11 . These global epistatic effects may be mediated by changes in the protein conformational dynamics.
Proteins have the inherent ability to adopt a variety of thermally accessible conformational states, which play a key role in protein evolvability and activity 12,13 . Along the catalytic cycle, enzymes can adopt multiple conformations important for substrate binding or product release 14,15 , and conformational change can be rate-limiting in some cases 16,17 .
Much debated is the existence of a link between active site dynamics and the chemical step 18,19 Some studies have suggested that mutations remote from the enzyme active site may directly impact the energetically accessible conformational states, thereby influencing catalysis [20][21][22] . This has been shown by means of crystal structures and NMR spectra of mutants along evolutionary pathways 20,23,24 together with computational assistance [25][26][27][28][29] .
Molecular Dynamics (MD) simulations allow the reconstruction of the enzyme conformational landscape, and how this is altered by mutations introduced by laboratory evolution [28][29][30] . Tuning the enzyme conformational dynamics has been recently identified as key for novel activity 21,24,29-31 . 4 Understanding the connection between conformational dynamics and epistasis has been limited to a few model enzymes (mainly beta-lactamases) and a single protein trait (usually activity) as a measure of fitness (this term originally refers to the reproductive success of organisms but it can be applied to protein activity, selectivity or stability) [32][33][34][35] . This contrasts with directed evolution where often two or more traits (e.g., activity and selectivity or stability) are sought for practical purposes 4,36 . Therefore, connecting epistasis to conformational dynamics increases our understanding of proteins. In turn, analysing non-additive epistatic effects can be expected to benefit in silico directed evolution 37 .
In the present work, we used a combination of enzyme kinetics and computational approaches to investigate epistatic effects and conformational dynamics in the stepwise evolution of a cytochrome P450 monooxygenase (CYP) engineered for the highly active, regio-and stereoselective oxidative hydroxylation of a steroid as a non-natural substrate 38 .
To determine epistatic effects effectively, we developed a computational program that can be applied to any protein and catalytic trait (https://epistasis.mutanalyst.com/). Unexpectedly, we found pervasive positive epistatic effects on multiple catalytic traits, with selectivity and activity being generally characterized by sign and magnitude epistasis. We found that the analysis of the link between protein epistasis and conformational dynamics reveals the increasing optimization of activity and selectivity along all evolutionary trajectories through fine tuning of loops, helices and beta-strands that gate active site entrance and modulate the active site by long-range networks of interactions. Our study offers guiding principles for the simultaneous engineering of both activity and selectivity in a model CYP member. 5

Multiple parameters define the biocatalytic landscape in P450 BM3
The haem-containing CYP super family is involved in the biosynthesis and degradation of organic molecules in a wide range of secondary metabolic pathways 39 and thus have many applications in biocatalysis 40,41 . Previously, we achieved the stereo-and regioselective hydroxylation of testosterone (1) by evolving the self-sufficient Bacillus megaterium cytochrome P450 BM3 monooxygenase 38 . P450 BM3 is one of the most active and versatile CYPs that oxidizes long fatty acids as the natural substrates, and there are various 3D structures of the haem domain alone without the reductase domain 41 . However, wild type (WT) does not accept steroids, which is the reason why we chose mutant F87A as the starting enzyme. While F87A accepts 1, it provides in a whole cell system only ~20% conversion with formation of a 1:1 mixture of 2β-hydroxytestosterone (2) and 15βhydroxytestosterone (3) 38 . Combinatorial saturation mutagenesis at the randomization site R47/T49/Y51 allowed the evolution of mutant R47I/T49I/Y51I/F87A (III) displaying 94% 2βselectivity and 68% conversion of 1 in whole cell reactions 38 (Fig. 1a). The mechanism is known to involve a radical process in which the catalytically active haem-Fe=O (Cpd I) abstracts an H-atom from aliphatic C-H followed by a fast C-O forming bond formation, which requires a precise substrate positioning, as in other cases 42 (Supplementary Fig. 1).
The three mutated residues are located next to each other (distances of C α is ~6 Å) lining the large binding pocket, but relatively far away (~15-20 Å) from haem-Fe=O, assuming the absence of dynamic effects (Fig. 1b). Complete deconvolution of variant III starting from parental F87A entails 3! = 6 theoretically possible upward pathways, which we constructed by generating the 6 intermediate mutants (Fig. 1c). The key question is how these residues determine selectivity and activity, and whether they interact epistatically.  Table 1).
The most 2β-selective variants (~67-91%) contain mutation Y51I (--I, I-I, -II and III), while the remaining ones proved to be 15β-selective, with substrate conversion being highest (35%) in mutants III and -II, and poor (~6-10%) in the remaining ones (Fig. 2a). NADPH leaks without substrate in all mutants except -II and III, which display a respective 2-and 3-fold increased NADPH consumption upon 1 addition (Fig. 2b). Mutants -II and III also showed a respective ~5-and ~10-fold improvement in product formation rates (PFR) compared to the remaining variants (Fig. 2c) 44 . Low CE values of 15-30% were found for all mutants, except for -II and III that display higher values of 37% (Fig. 2d). The total turnover number (TTN) is highest in mutants -II and III (Fig. 2e), whereas the total turnover frequency (TTF), PFR and NADPH consumption rate are highest in III (Fig. 2f).  Notably, substrate rotation inside the active site pocket is only observed in mutant III, which presents a substantially wider active site pocket as compared to the other variants: the active site volume in the ---variant is 89 Å 3 , which is expanded to 235 Å 3 in III ( Supplementary Fig. 8). We hypothesised that in all variants, except III, selectivity must be determined by the orientation adopted by the substrate while accessing the haem cavity. To reconstruct the substrate binding process in P450 BM3 , we placed substrate 1 in the bulk solvent and started unbiased MD simulations followed by accelerated Molecular Dynamics Our aMD simulations show that the orientation of the substrate when accessing the catalytic site during the second step of the binding pathway dictates selectivity. In the productive trajectory corresponding to mutant I--, 1 accesses the haem with the correct orientation for 15β-hydroxylation ( Fig. 4b and Supplementary Fig. 9). In this case, residue Y51 establishes a hydrogen bond with the carbonyl group of 1 ( Supplementary Fig. 10), constraining the substrate in a such way that it can only progress into the active site 12 pocket pointing its C15 ahead towards haem-Fe=O 47 . Thus, Y51 is instrumental in promoting the observed C15-selectivity in ---, and in I--, -I-and II-variants. Additionally, the higher C2-selectivity observed in variant III occurs due to the flipping and motion of the β4 sheet, destabilizing pose 15 while favouring pose 2 (Fig. 4c).
Interestingly, the analysis of the most relevant conformational changes in each independent variant through Principal Component Analysis (PCA) indicates that the most active mutant III shows the highest flexibility of the β4 sheet (Fig. 4d). These flexible regions, responsible of controlling substrate binding as described above, influence activity, as mutant III shows the highest TTF, NADPH consumption rate and PFR numbers. Thus, favouring a more efficient substrate binding in a catalytically competent pose increases enzyme TTF, while NADPH leak is reduced due to a more efficient interaction between the substrate and the catalytically active Fe=O species once generated.  We quantified all amino acid interactions among all 6 trajectories leading from parent ---to mutant III for multiple parameters (Table 1) Table 7). Finally, coupling efficiency shows 6 cases of synergistic epistatic effects and 1 case of antagonistic epistatic effects (Supplementary Table 8). These results indicate that an efficient formation of 2β-hydroxytestosterone I--+ -I-II-A +ME -ME -ME -ME -ME +RSE B -I-+ --I -II +SE +SE +ME +ME +ME +ME +SE B I--+ --I I-I +RSE +SE -ME -ME -ME -ME -SE B I--+ -II III +SE +SE A +ME +ME +ME +SE B II-+ --I III +RSE +SE +ME +ME +ME +ME +ME B I-I + -I-III +SE +SE +ME +ME +ME +ME +SE T I--+ -I-+ --I III +SE +SE +ME +ME +ME +ME +SE a. Binary (B) and tertiary (T) combinations.
b. See Fig. 1c for nomenclature of mutations and mutants.

Conformational dynamics shape the evolution of fitness landscapes
The complete deconvolution of a multi-mutational variant enables the exploration of all possible pathways from parental enzyme to the evolved mutant, thus determining a full multidimensional fitness landscape. To explore the step-wise accessibility in the evolution of parental ---towards III, we constructed a fitness "pathway" landscape 51 , which for the first time focuses on activity and on selectivity (Fig. 6a). This system is a 4-dimensional See Fig. 1d for mutant abbreviations.
To identify the most important conformational changes in all evolutionary pathways, and to describe how distal mutations influence them, we performed extensive MD simulations in the absence of substrate of each variant and applied the dimensionality reduction technique PCA 28 to the whole dataset ( Fig. 6c and 7a). A conformational population analysis resulting from all the accumulated simulation data was generated in terms of 22 principal components (PC) 1 and 3, which describe the first and third most important conformational differences among all variants (for PC2 see Supplementary Fig. 13).
Notably, a clear distinction between 2-and 15β-selective mutants is revealed through their separation with respect to PC1 (x-axis), suggesting that changes in selectivity are linked to the impact that the introduced mutations have on the enzyme conformational dynamics (Fig. 6c). These conformational changes mainly involve the G helix, the F-G loop, the  which residues are those that are more important for the observed conformational changes, which in this case are associated with different selectivities and activities 21 . In the parent ---enzyme, the generated SPM identifies residues Y51 as well as V78 and A330, known from earlier studies 38 , to be important for enzyme activity and to be interconnected in terms of Cα correlated movements, thus highly contributing to the enzyme inactive-toactive conformational interconversion. This highlights why these three distal positions are found to be key during the evolutionary pathway for improving catalysis, in line with what we observed for the laboratory-evolved retro-aldolases 21 . Importantly, the SPM also describes strong connections between all the five-stranded β1 sheet with the β4-2 strand and the B' helix, which we showed to be crucial for substrate binding and gating (Fig. 7d).
This long-distance communicating pathway between β1 and β4 sheets directly relates the mutated positions on β1-4 strand (positions R47, T49 and Y51) and the increased flexibility of the β4 sheet. This shows how evolutionary pathways take advantage of networks of residue-residue interactions to fine tune the conformational dynamics along the evolutionary pathways for improving enzyme function. 25

Discussion
The identification of epistasis and its molecular mechanism are crucial for understanding protein function, but these are hardly ever explored in directed or natural evolution studies of multiparametric optimization. To the best of our knowledge, only the study by Hartl and colleagues on fitness landscapes of drug resistance towards trimethoprim fits such a description, in which activity, binding and folding stability were shown to depend to some extent on epistasis 52 . Other studies dealing with directed evolution of activity and selectivity in plant sesquiterpene synthases 53 and P450 BM3 54 do not consider epistatic effects, while "stability-mediated epistatic effects" were observed in a P450 BM3 study 55 some years later 5 .
The construction of fitness landscapes using a single catalytic parameter has been reported in two main research areas. While such landscapes have revealed that usually many pathways are accessible in laboratory evolution of enzymes as catalysts in organic chemistry 4,7,51,56 , different conclusions have been made in evolutionary biology [8][9][10][11] . In the present study, we observed that there are no direct accessible pathways to activity and that only a few trajectories (2/6) are accessible to both selectivity and activity. Interestingly, the two accessible pathways for selectivity correspond to mutation Y51I. The addition of mutation R47I has almost no effect on activity and selectivity, while mutation T49I, which is closer to Y51I, significantly improves both parameters as it alters the enzyme conformational dynamics. T49I and Y51I enhance the flexibility of the β4 sheet, and both combined with R47I reshape the active site for enhanced 2β-hydroxylation. The triple mutant III excels in all parameters compared to all double and single mutants.
Unexpectedly, upon going from the "parent" enzyme ---to mutant III, cooperative effects at each step in the evolution of activity and selectivity remain pervasive. Residues R47I, T49I 26 and Y51I are located at the entrance of a long substrate channel far away from the active site 41 . Since the mutated residues were not observed to interact directly with the substrate in our MD simulations, we propose that the observed epistatic effects, which are mediated by long-range interactions, can occur via one main mechanism: Direct effects between mutations, but no direct interaction between the substrate and the mutations 11 .
Our computational exploration of the mutation-induced conformational changes on F87A variants provide key insights concerning the importance for P450 BM3 46 . It should be also mentioned that P450 cam complexed with its redox partner adopts an open conformation that stabilizes the active site key for the proton relay network 60 . 27 Using SPM analysis, the most important positions that participate in the open/closed conformational conversions that dictate selectivity and activity were identified. Of relevance is that the key residue Y51 found to be essential for both activity and selectivity in this study, is contained in the SPM path, as well as the previously described V78 and A330 positions 38 . SPM also highlights a long-distance communicating pathway between β 4 and β 1 where positions R47, T49, and Y51 are located, which is exploited along the evolutionary pathway for altering BM3 protein function.
This study evidences that epistasis is intrinsically linked to conformational dynamics, which fine-tunes multiple functions in a protein involved in secondary metabolism. Our findings on the conformational changes connected to CYP activity and selectivity and residue networks that modulate such conformational conversions can be expected to facilitate future rational evolution of these enzymes.

Competing financial interests
The authors declare no competing financial interests.

Additional information
Supplementary information is available in the online version of the paper.

Chemicals, materials and software
All commercial chemicals were purchased with the highest purity grade (e.g., HPLC) from for the thermodynamic cycles is GraphPad Prism version 6 (La Jolla, US).

P450 BM3 -based oxidation reactions using purified enzymes
Biotransformation reactions were performed as described earlier 36 30 VWR, Germany) into a new 500 μL MTP (Nunc). The MTPs, which were closed using silicon lids for the corresponding plates, were stored at 4°C prior to screening.

Steroid hydroxylation screening by HPLC
A LC-2010 HPLC system (Shimadzu, Japan) equipped with four MTP racks was used employing a reverse-phase "250 Eclipse XDB" C18 column of 250 mm (1.8 μM size particle) together with a corresponding pre-column bought from Agilent (Waldbronn, Germany) as stationary phase and installed in the oven at 40°C. The mobile phase was composed of a mixture of high-purity water generated from the local deionized water supply using a TKA MicroLab water purification system, acetonitrile (CH 3 CN) and methanol (MeOH). For testosterone (1)

Large-scale protein expression and purification
The P450 BM3 mutants were inoculated into 4 mL LB Kan50 broth and cultured overnight at Lipidex 1000 (Perkin Elmer) chromatography was conducted. 10 mL of Lipidex resin stored in methanol was used for column packing, which was subsequently washed with 10 column volumes of water and 10 column volumes of buffer (25 mM KPi, pH 7.5). After that, the protein was applied onto the column. The column was then capped to leave the protein in contact with resin at room temperature for 1 h, allowing hydrophobic compounds to bind to the resin. The protein was completely eluted from the Lipidex resin with buffer (25 mM KPi, pH 7.5) and the column was cleaned with at least 10 column volumes of methanol.
The purified protein was pooled, and the buffer was exchanged to 100 mM KPi (pH 8.0) by ultrafiltration using a 50 kDa Amicon Ultra centrifugal filter, and then concentrated to 1 mL and stored in -80°C for further use. An aliquot was thawed at room temperature and enzyme concentration was determined by CO difference spectrum analysis prior to usage.

The enzyme concentration determined for all intermediate mutants is shown in
Supplementary Table 1.
Determination of kinetic parameters using isolated enzymes 32 The kinetic experiments were performed using a JASCO V-650 spectrophotometer (JASCO International CO., LTD, Japan) equipped with a PAC-743 Peltier temperature control unit and UV-Vis-NIR Spectra Manager software II. All assays were performed in 100 mM potassium phosphate buffer (pH 8.0) at 25°C using quartz cuvettes adapted for magnetic stirring (900 rpm). The initial rate based on NADPH consumption was determined by measuring NADPH depletion monitored at 340 nm (ε = 6.22 mM