Introduction

Directed evolution constitutes a powerful tool for optimizing protein properties, including activity, substrate scope, selectivity, stability, allostery or binding affinity. By applying iterative rounds of gene mutagenesis, expression and screening (or selection), proteins have been engineered for developing more efficient industrial biocatalytic processes1,2,3,4. Directed evolution has also provided important insights into the relationship between protein sequence and function4,5,6, yet understanding the intricacies of non-additive epistatic effects remains a challenge7. Epistasis means that the phenotypic consequences of a mutation depend on the genetic background8,9,10,11. Epistatic effects can be negative (antagonistic/deleterious) or positive (synergistic/cooperative) if the respective predictive value is smaller or greater in sign/magnitude than the expected value under additivity. Sign epistasis (SE) occurs when a mutation has a deleterious or beneficial effect alone but an opposite effect when combined with other(s) mutation(s), whereas in magnitude epistasis (ME) a mutation has a deleterious or beneficial effect in isolation and in combination with other mutation(s). Based on studies of natural and laboratory protein evolution, negative10 or positive11 epistasis is more widespread than originally thought7. Importantly, positive epistasis increases the evolution of new protein functions because it allows access to mutational pathways that avoid deleterious downfalls. On the other hand, negative epistasis has been associated with a higher tolerance for mutations, which is important because this mutational robustness enables protein stability and evolution12. For fundamental and practical reasons, it is thus important to determine the existence, type and molecular basis of epistasis in protein evolution. Epistatic effects can arise between residues that are located closely or away from each other via long-range indirect interactions, both mechanisms involving sometimes direct or indirect substrate binding11. These global epistatic effects may be mediated by changes in the protein conformational dynamics.

Proteins have the inherent ability to adopt a variety of thermally accessible conformational states, which play a key role in protein evolvability and activity13,14. Along the catalytic cycle, enzymes can adopt multiple conformations important for substrate binding or product release15,16, and conformational change can be rate-limiting in some cases17,18. Much debated is the existence of a link between active site dynamics and the chemical step19,20. Some studies have suggested that mutations remote from the enzyme active site may directly impact the energetically accessible conformational states, thereby influencing catalysis21,22,23. This has been shown by means of crystal structures and nuclear magnetic resonance (NMR) spectra of mutants along evolutionary pathways21,24,25 together with computational assistance26,27,28,29,30. Molecular dynamics (MD) simulations, which are highly complementary to NMR analyses31, allow the partial reconstruction of the enzyme conformational landscape, and how this is altered by mutations introduced by laboratory evolution29,30,32. Tuning the enzyme conformational dynamics can play an important role in the emergence of novel activities22,25,30,32,33.

The connection between conformational dynamics and epistasis has been studied in proline isomerase (cyclophilin A)24, phosphotriesterase34 and β-lactamases35,36,37. For example, negative SE between two distal mutations limited dynamics of active site loops mediating substrate accessibility in a β-lactamase35. These studies provide fascinating insights, but they are limited to a single protein trait (usually activity) as a measure of fitness. This term originally refers to the reproductive success of organisms, but it can be applied to protein activity, selectivity or stability34,35,36,37. This contrasts with directed evolution where often two or more traits (e.g. activity and selectivity or stability) are sought for practical purposes4,38. Therefore, connecting epistasis to conformational dynamics increases our understanding of proteins. In turn, analysing non-additive epistatic effects can be expected to benefit in silico-directed evolution39.

In the present work, we used a combination of enzyme kinetics and computational approaches to investigate epistatic effects and conformational dynamics in the stepwise evolution of a cytochrome P450 monooxygenase (CYP) engineered for the highly active, regioselective and stereoselective oxidative hydroxylation of a steroid as a non-natural substrate40.

To determine epistatic effects effectively, we developed a Python-based script and freely accessible web-app (https://epistasis.mutanalyst.com/), which can be used for any enzyme and catalytic trait (or for any protein and parameter). Unexpectedly, we found pervasive positive epistatic effects on multiple catalytic traits, with selectivity and activity being generally characterized by SE and ME. We found that the analysis of the link between protein epistasis and conformational dynamics reveals the increasing optimization of activity and selectivity along all evolutionary trajectories through fine tuning of loops, helices and β-strands that gate active site entrance and modulate the active site by long-range networks of interactions. Our study offers guiding principles for the simultaneous engineering of both activity and selectivity in a model CYP member.

Results

Multiple parameters define the biocatalytic landscape in P450BM3

The CYP super protein family has >300,000 members that are involved, among others, in the biosynthesis of steroids, fatty acids and natural products as well as in the degradation of drugs in humans and of xenobiotics in the environment.41,42 Thus this is an important enzyme class with relevant applications in biocatalysis, biomedicine, pharmacology, toxicology and biotechnology42,43,44. Previously, we achieved the stereoselective and regioselective hydroxylation of testosterone (1) by evolving the self-sufficient Bacillus megaterium cytochrome P450BM3 monooxygenase40. P450BM3 is one of the most active and versatile CYPs that oxidizes long fatty acids as the natural substrates, and there are various three-dimensional structures of the haem domain alone without the reductase domain45. However, wild type does not accept steroids, which is the reason why we chose mutant F87A as the starting enzyme. While F87A accepts 1, it provides in a whole-cell system only ~20% conversion with formation of a 1:1 mixture of 2β-hydroxytestosterone (2) and 15β-hydroxytestosterone (3)40. Combinatorial saturation mutagenesis at the randomization site R47/T49/Y51 allowed the evolution of mutant R47I/T49I/Y51I/F87A (III) displaying 94% 2β-selectivity and 67% conversion of 1 (1 mM) in 24 h whole-cell reactions40 (Fig. 1a). The mechanism is known to involve a radical process in which the catalytically active haem-Fe=O (Cpd I) abstracts an H atom from aliphatic C-H followed by a fast C-O bond formation, which requires a precise substrate positioning, as in other cases46 (Supplementary Fig. 1). The three mutated residues are located next to each other (distances of Cα is ~6 Å) lining the large binding pocket but relatively far away (~15–20 Å) from haem-Fe=O, assuming the absence of dynamic effects (Fig. 1b). Complete deconvolution of variant III starting from parental F87A entails 3! = 6 theoretical pathways, which we constructed by generating the respective 6 intermediate mutants (Fig. 1c/d). The key question is how these residues determine selectivity and activity and whether they interact epistatically.

Fig. 1: Model system based on P450BM3 as biocatalyst for the selective oxidation of a steroidal substrate.
figure 1

a Testosterone (1) is selectively hydroxylated at position 2β (2) by mutant III (R47I/T49I/Y51I/F87A). b Active site of parent enzyme showing F87A mutation (black) above the haem (yellow) as well as WT resides R47, T49 and Y51 (red). The distances between the α-C atoms of the following pairs of residues are (Å): R47-T49 (7.0), R47-Y51 (13.4), T49-Y51 (7.0). Image and atom distances calculations obtained with PyMol Molecular Graphics System, V. 1.5.0.4 (Schrödinger, LLC). An interactive figure of parental variant --- docked with 1 was created with Michelanglo68 (https://michelanglo.sgc.ox.ac.uk/r/p450) highlighting the mutated residues and secondary structures discussed in this work. c The 6 possible evolutionary trajectories between parental mutant F87A (---) and “triple” mutant III involve three “single” mutants I-- (R47I/F87A), -I- (T49I/F87A) and --I (Y51I/F87A) as well as three “double” mutants II- (R47I/T49I/F87A), I-I (R47IY51I/F87A) and -II (T49I/Y51I/F87A). d Mutant abbreviations. The signs − and + indicate that the respective mutation is absent and present, respectively.

All intermediate mutants were generated, overexpressed in Escherichia coli BL21-Gold(DE3) and purified (Supplementary Fig. 2). Parent F87A (---) and variant III were also included, resulting in a total of 8 enzymes. Using defined substrate and NADPH concentrations, multiple parameters were determined (Supplementary Note 1 and Supplementary Table 1).

The most 2β-selective variants (~67–91%) contain mutation Y51I (--I, I-I, -II and III), while the remaining ones proved to be 15β-selective, with substrate conversion being highest (35%) in mutants III and -II, and poor (~6–10%) in the remaining ones (Fig. 2a). NADPH leak rate without substrate is higher than NADPH consumption rate (NCR) in all mutants except -II and III, which display a respective <2- and <3-fold increased NCR (Fig. 2b). Mutants -II and III also showed a respective ~5- and ~10-fold improvement in product formation rates (PFRs) compared to the remaining variants (Fig. 2c), suggesting that variants -II and III have a good coupling efficiency (CE). CE describes how well the reductase domain delivers electrons from NADPH via the flavin cofactors to the substrate in the haem domain. A low CE value indicates futile NADPH usage, resulting in the formation of reactive species during the catalytic cycle (Supplementary Fig. 1) that can inactivate the enzyme47. Low CE values of 15–30% were found for all mutants, except for -II and III that display higher values of 37% (Fig. 2d). The total turnover number (TTN) is highest in mutants -II and III (Fig. 2e), whereas the total turnover frequency (TTF), PFR and NCR are highest in III (Fig. 2f).

Fig. 2: Multiple enzymatic parameters of deconvolution mutants.
figure 2

a Selectivity and conversion data are obtained from HPLC data and shown in percentage. b NADPH leak and consumption rate were measured in the absence and presence of testosterone (1) substrate, respectively. c Product formation rate (PFR) is calculated by multiplying the NADPH consumption rate by coupling efficiency. d Coupling efficiency (CE) is the ratio between NADPH consumption and production formation, and it is reported in percentage. e TTN describes the total moles of products per moles of enzyme after NADPH depletion. f TTF normalizes TTN by time after NADPH depletion. See Fig. 1d for mutant abbreviations. Other products mainly include the 15β-alcohol and other regioisomers. The data represent the average of two independent experiments (n = 2). Source data are provided with this paper.

Distal mutations enable conformational changes at the active site required for regioselectivity

To gain insights about the origin of selectivity and activity, we performed computational studies on all mutants. Given the identification of comparable reaction barriers for hydrogen atom abstraction from C2 and C15 by using Density Functional Theory (DFT) calculations on truncated models (difference of <1.0 kcal/mol, see Supplementary Note 2, Supplementary Fig. 3 and Supplementary Table 2), we carried out MD simulations. For each mutant, we started from pose 15 and from pose 2 (i.e. positioning C15 or C2 closer to the Fe=O, respectively) to analyse whether the binding pose of 1 in the active site determines the experimentally observed selectivity (Fig. 3 and Supplementary Note 3). Starting from parent ---, pose 2 (presenting C2 close to the catalytic Cpd I) and pose 15 (C15 close to Cpd I) generated from manual dockings are possible (Supplementary Figs. 4 and 5). Further analysis of these binding poses along MD simulations in --- indicate that substrate 1 in pose 15 explores near attack conformations (NACs)48 closer to the quantum mechanics-predicted ideal transition state geometry for H abstraction than in pose 2 (Fig. 3a), thus making pose 15 more productive towards 15β-hydroxylation. Introducing mutations R47I and/or T49I does not have any effect on selectivity (Fig. 3b, c, e), i.e. the selectivity is retained due to the catalytically competent conformation inherent in pose 15 along MD simulations (pose 2 adopts a reduced number of catalytically competent conformations). However, the picture completely changes when mutation Y51I is introduced: the substrate bound in pose 15 becomes unstable and leaves the active site in 1 out of 3 replicas (ca. >15 Å C2··O distances explored, Fig. 3d), whereas pose 2 is highly stabilized and explores short C2··O distances for the incipient C-H eventually leading to 2β-hydroxytestosterone in 2 out of 3 replicas (Fig. 3d and Supplementary Figs. 6 and 7). As experimentally determined, 2β-selectivity is retained in variants I-I, -II and III that contain mutation Y51I (Fig. 3f–h). This is even more dramatic in variant III, in which pose 15 is highly unstable and 1 rapidly rotates to position C2 close to the catalytic Cpd I for 2β-hydroxylation (Supplementary Movie 1). Instead, pose 2 in variant III is stable and adopts near attack conformations in all MD replicas (Fig. 3d and Supplementary Figs. 6 and 7).

Fig. 3: Conformational population analysis of key geometric parameters for hydroxylation.
figure 3

Distances determined between the oxygen atom of haem-Fe=O and the C-atom (C2 or C15) of 1 (x-axis) and angles formed by O(Fe = O) – (1)-H(C2/15) – (1)-C(2/15) (y-axis) from the first replica of the MD dataset of parent mutant (a), “single” mutants (bd), “double” mutants (eg) and “triple” mutant III (h) (see Supplementary Figs. 6 and 7 for additional replicas). Geometric parameters measured for C-2 and C-15 are shown in red and blue, respectively. The ideal distance and angle for the transition state TS (black dot) corresponds to the Density Functional Theory (DFT) optimized geometry for the C–H abstraction by haem-Fe=O using a truncated computational model (Supplementary Note 2). See Fig. 1d for mutant abbreviations.

Notwithstanding, mutant -II and III only differ for the R47I mutation in the latter case, yet the re-orientation of the substrate from pose 15 to pose 2 is observed only during the MD simulation of mutant III. To further investigate the specific effect of R47I mutation on substrate rotation inside the haem pocket, we performed a Principal Component Analysis (PCA) on the substrate-bound MD trajectories of mutant -II and III, finding that pc2 indeed describes an increased flexibility of residues A87, T260, G265 and T327 in mutant III, as compared to -II (Supplementary Fig. 8). Thus R47I may modulate via long-range conformational dynamic effect the flexibility of such residues, which have been shown to be instrumental to promote substrate re-orientation in mutant III (Supplementary Movie 2). Moreover, mutant III presents a substantially wider active site pocket as compared to the other variants: the active site volume in the --- variant is 89 Å3, which is expanded to 235 Å3 in III (Supplementary Fig. 9). We hypothesized that, in all variants, except III, selectivity must be determined by the orientation adopted by the substrate while accessing the haem cavity. Recently, Mondal et al. characterized the substrate recognition and binding pathway in related P450cam using MD simulations, showing the formation of a single key channel in which the substrate needs to reside in a long-lived intermediate state before reaching the catalytic iron-oxo species49.

To reconstruct the substrate-binding process in P450BM3, we placed substrate 1 in the bulk solvent and started unbiased MD simulations followed by accelerated molecular dynamics (aMD) simulations (Supplementary Note 4). Among independent MD and aMD trajectories for all variants (250 ns MD + 750 ns aMD), only a single trajectory by mutant I-- was observed to be productive where the substrate reached the haem active site (Fig. 4b). We observed a two-step binding mechanism in this trajectory: first, the carbonyl moiety of 1 enters channel 2a (Fig. 4a) and stays above the β1-2 strand, where residues R47I, T49I and Y51I are located, forming a long-lived substrate–enzyme bound intermediate. There, substrate 1 can reorient, although its access to the active site is restricted by the β4 sheet that acts as a gate. Second, a network of coupled conformational changes occur simultaneously: G helix adopts a bend conformation, which impacts F helix, F–G loop and β1 sheet conformation, and in turn retreats β4 sheet, allowing 1 progression towards the catalytic centre (Supplementary Video 2). This two-step mechanism is similar to what Mondal et al. observed in P450cam49.

Fig. 4: Secondary structural elements determining regioselectivity and activity.
figure 4

a Spheres indicate the channels observed in the WT crystal structure (PDB: 1FAG) and in the mutants of our MD simulations (red and blue colour). b Trajectories of 1 towards the active site of mutant I-- and binding of 1 above the haem. c Rotation of 1 from pose15 to pose2 in the active site of mutant III. β Hydrogens belonging to C2 and C15 atoms are depicted in pale green and yellow colour, respectively. d Principal Component Analysis (pc2) of mutant III (APO). The thickness of the line is proportional to the motion and the colour scale varies from blue (minimum motion) to red (maximum motion). The β4 sheet is highlighted with a red circle. Standard nomenclature for channels69 and secondary structure elements70 is used.

Our aMD simulations indicate that the orientation of the substrate when accessing the catalytic site during the second step of the binding pathway dictates selectivity. Once inside the haem pocket, substrate rotation was not observed in mutant I--, thus predicting that the orientation assumed by the substrate when entering the haem site ultimately governs selectivity. In fact, the previous substrate-bound MD simulations suggest that only variant III has a sufficiently wide active site pocket for allowing substrate rotation. These findings indicate that, in all variants, except III, selectivity is determined by the orientation adopted by the substrate while entering the haem pocket. In the productive trajectory corresponding to mutant I--, 1 accesses the haem with the correct orientation for 15β-hydroxylation (Fig. 4b and Supplementary Fig. 10). In this case, residue Y51 establishes a hydrogen bond with the carbonyl group of 1 (Supplementary Fig. 11), constraining the substrate in such way that it can only progress into the active site pocket pointing its C15 ahead towards haem-Fe=O50. Thus Y51 is instrumental in promoting the observed C15-selectivity in --- and in I--, -I- and II- variants. It should be noted that, in previous studies, R47 and especially Y51 were found to interact with the terminus end of long-chain fatty acids while bound at the P450BM3 active site51. Such direct interaction with testosterone and Y51 is only possible at the pre-binding pocket, which is lost after the retreat of β4 sheet, allowing substrate access to the haem pocket. Additionally, the higher C2-selectivity observed in variant III occurs due to the flipping and motion of the β4 sheet, destabilizing pose 15 while favouring pose 2 (Fig. 4c).

Interestingly, the analysis of the most relevant conformational changes in each independent variant through PCA predicts that the most active mutant III shows the highest flexibility of the β4 sheet (Fig. 4d). This higher flexibility related to activity has, however, no impact on the B’ helix and B-B’ loop conformational dynamics (Supplementary Figs. 12 and 13). These flexible regions, responsible of controlling substrate binding as described above, are likely to influence activity, as mutant III shows the highest TTF, NCR and PFR numbers. These findings suggest that favouring a more efficient substrate binding in a catalytically competent pose increases enzyme TTF, while NADPH leak is reduced due to a more efficient interaction between the substrate and the catalytically active Fe=O species once generated.

Pervasive epistatic effects on multiple parameters are cooperative

According to Tokuriki11 and Bendixsen et al.52, non-additive mutational effects can occur in different forms (Fig. 5), which can be calculated with additivity equations (Supplementary Note 5). Aiming at exploring the existence of epistatic effects in an effective manner, we developed and applied a Python-based computational program to automatically determine the type and intensity of amino acid interactions among all possible mutational combinations for the three mutations introduced (Supplementary Note 6).

Fig. 5: Explanation of non-additive effects in single mutations A and B.
figure 5

Additivity (ADD) occurs when the sum of the individual effects of mutations A and B is equal to the value in double mutant AB (black diagonal lines). Epistatic effects emerge when the result of combining two individual mutations (or sets of mutations) is non-additive, i.e. the fitness value after combining mutations A with B to generate a doubly mutated AB variant is not equal to the sum of the individual A and B contributions. Epistatic effects can be positive/synergistic or negative/antagonistic depending upon their result with respect to additivity. They occur in the form of: (i) positive magnitude epistasis (+ME) if both the single mutations A and B are beneficial for a fitness trait and they produce a greater-than-additive fitness improvement (or a smaller-than-additive fitness drop if both are deleterious) when combined together in mutant AB; (ii) negative magnitude epistasis (−ME) if both the single mutations A and B are beneficial for a fitness trait and they produce a smaller-than-additive fitness improvement (or a greater-than-additive fitness drop if both are deleterious) when combined into AB; (iii) positive sign epistasis (+SE) if one mutation A is deleterious on its own but can enhance the beneficial effect of another mutation B when combined into AB; (iv) negative sign epistasis (−SE) if one mutation A is beneficial on its own but can enhance the deleterious effect of another mutation B when combined into AB; (v) positive reciprocal sign epistasis (+RSE) if both mutations A and B are deleterious alone, but they produce a beneficial effect when combined into AB and (vi) negative reciprocal sign epistasis (−RSE) if both are beneficial but they produce a deleterious effect when combined into AB.

We quantified all amino acid interactions among all 6 trajectories leading from parent --- to mutant III for multiple parameters focused on the evolution towards 2β-hydroxytestosterone (Table 1). All combinations on substrate conversion show synergistic effects, with 6 (86%) and 1 (14%) cases of SE and ME, respectively (Supplementary Table 3). For 2β-selectivity, all interactions are likewise synergistic, with most of them showing positive SE and only one case of positive ME (combination of R47I and T49I). For example, the combination of the single mutations R47I, T49I and Y51I (in parent mutant ---) is expected to contribute −3.45 ± 0.25 kJ/mol. The two former mutations confer 15β-selectivity in mutants I-- and -I-, while the latter one induces 2β-selectivity in variant --I. Yet the experimental value of mutant III yields 5.6 ± 0.0 kJ/mol, which represents a difference of about 9 kJ/mol between the experimental and theoretical values (Supplementary Table 4). Whereas NCR has 6 cases of positive epistatic effects (86%) and one negative case (14%), PFR likewise shows 6 cases of positive effects but one additive case (Supplementary Tables 5 and 6). Similarly, CE shows 6 cases of synergistic epistatic effects and 1 antagonistic case (Supplementary Table 7). Finally, TTF and TTN show the same synergistic effects of 70% SE and 30% ME (Supplementary Tables 8 and 9). Overall, these results indicate that an efficient consumption of NADPH and oxidation of testosterone towards formation of 2β-hydroxytestosterone requires pervasive cooperative effects among R47, T49 and R51 regardless of mutational combination.

Table 1 Epistatic analysis of all possible mutational combinations on multiple parameters towards formation of product 2β-hydroxytestosterone.

Conformational dynamics shape the evolution of the fitness landscape

The complete deconvolution of a multi-mutational variant enables the exploration of all possible pathways from parental enzyme to the evolved mutant, thus determining a full multidimensional fitness landscape. Such landscapes provide insights on the different routes that evolution can take. Additionally, engineering proteins by single mutational steps53 is a highly successful strategy in directed evolution2,4,5,38,54,55. To explore the step-wise accessibility in the evolution of parental --- towards III, we constructed a fitness “pathway” landscape56 based on both activity and selectivity (Fig. 6a). This system is a 4-dimensional surface (3 sets of mutations as independent vectors and ΔΔG as the dependent variable obtained from the experimental selectivities). Two kinds of trajectories can be noted: those lacking local minima (favoured) and those characterized by at least one local minimum (disfavoured). Pathways 1–4 are characterized by a decrease in both selectivity and activity at the first step, indicating that they are evolutionarily disfavoured (pathway 3 is highlighted in red). Pathway 5 (highlighted in green) and 6 are favoured because --I enables conformational changes in the active site and has implications on the substrate binding (as discussed above). In the two latter pathways, activity improves slightly in the evolution of --- towards --I (TTF = 11 → 21) at the first step, but at the second and third steps of pathway 6 it increases significantly towards -II and III (TTF = 21 → 72 → 158). This is due to the β4 sheet that shows an increased flexibility in the most active mutants -II and III, highlighting the key role of β4 sheet for activity (Fig. 6b). Interestingly, when all other parameters are considered, pathways 5 and 6 are the only ones that remain accessible (Supplementary Note 7 and Supplementary Fig. 14), with selectivity and CE showing the strongest non-additive effects, thus indicating the importance of residue Y51 towards efficient 2β-hydroxytestosterone formation.

Fig. 6: Stepwise evolution of multiple functions and conformational dynamics.
figure 6

a Multiparametric fitness pathway landscapes of the 6 evolutionary pathways leading from parent mutant --- upward to mutant III are shown in 3D (left) and frontal (right) format (green and red arrows indicate examples of favourable and unfavourable pathways, respectively). Fitness is defined by activity as total turnover frequency (TTF) displayed as heat-maps from 0 (white) to 170 (blue) as well as 2β-selectivity as ΔΔG in kJ per mol (y-axis) at each evolutionary step (z-axis) for each pathway (x-axis). Green and red bars indicate favoured and disfavoured pathways, respectively. The mutant in red represents steps with disfavoured energy, i.e. the point where the pathway is blocked. The data represent the average of two independent experiments (n = 2). Original data are listed in Supplementary Table 1 and pathway analysis in Supplementary Tables 10 and 11. b Progression of the β4 sheet flexibility along the 6 pathways as revealed by PC analysis (pc2) of the substrate-free simulations of mutated enzymes analysed separately (see complete haem domains in Supplementary Fig. 15). The thickness of the line is proportional to the motion and the colour scale varies from blue (minimum motion) to red (maximum motion). c Evolution of the conformational dynamics along the 6 pathways and its connection to 2β- or 15β-selectivity. The analysis of the global conformational dynamics of the substrate-free simulations of mutated enzymes, as shown by pc1/pc3, indicate that 2β- and 15β-selective mutants explore conformations lying at positive and negative values of pc1, respectively. Colour scale varies from red (less populated) to blue (more populated). See Fig. 1d for mutant abbreviations.

To identify the most important conformational changes in all evolutionary pathways and to describe how distal mutations influence them, we performed extensive MD simulations in the absence of substrate of each variant and applied the dimensionality reduction technique PCA29 to the whole data set (Figs. 6c and 7a). A conformational population analysis resulting from all the accumulated simulation data was generated in terms of principal components (PCs) 1 and 3, which describe the first and third most important conformational differences among all variants (for PC2, see Supplementary Fig. 16). Notably, a clear distinction between 2β- and 15β-selective mutants is revealed through their separation with respect to PC1 (x-axis), suggesting that changes in selectivity are linked to the impact that the introduced mutations have on the enzyme conformational dynamics (Fig. 6c). These conformational changes related to selectivity mainly involve the G helix, the F–G loop, the β1 hairpin and the B’ helix (located at the entrance of the 2a/b channels) as well as the a-A loop and the β4 sheet (located at the entrance of the 2f channel) (Fig. 7b)57,58. In variant -I-, the channel 2a has a narrower substrate access entrance due to a closed state of the F–G loop (ca. 9.3 Å determined between the Cα of R47 and N192). Conversely, the combination of mutations introduced in III favours an open conformational state of the same F–G loop (ca. 12.4 Å measured between the Cα of I47 and N192), enlarging the access channel 2a, which is mainly responsible for allowing access to the enzyme-binding pocket (Fig. 7c). Indeed, the area surrounding the access channel 2a in mutant III is calculated to have a volume of 140 Å3 with respect to 44 Å3 in -I-.

Fig. 7: Analysis of the conformational dynamics of deconvolution mutants.
figure 7

a Conformational population analysis built from the combined PCA of all substrate-free simulations of mutants. The conformational populations of 2β-selective mutants -II and III are highlighted. All replicas (3/3) of III and -II, 2/3 replica of I-I and --I lie on positive values of pc1 (10 out of 12 replicas of 2β-selective mutants). All replicas (3/3) of -I-, 2/3 replicas of ---, I-- and II- show negative values of pc1 (9 out of 12 replicas of all 15β-selective mutants). b Overlay of the conformational changes involved in PC1. c Zoom of b showing the channel 2a cavity of mutant III (left panel) and -I- (right panel). In b, c, Mutant -I- is shown in blue, whereas the evolved mutant III is coloured in red. The F-G loop, β1 sheet and channel 2a cavity are highlighted in teal and magenta for -I- and III, respectively. d Analysis of the most important correlated motions of mutant --- by means of the shortest path map (SPM). Mutational hotspot used in this or precedent work41 that appear in the SPM are highlighted with red spheres, whereas positions adjacent to mutational hotspot are highlighted with orange spheres. See Fig. 1d for mutant abbreviations.

To further study the link between epistasis and conformational dynamics, we applied the shortest path map (SPM) analysis22 (Supplementary Note 9) using the accumulated 1.8 µs MD simulation performed on parent --- in the absence of substrate. SPM considers the different conformations that the enzyme samples along the MD simulation and identifies which residues are those that are more important for the observed conformational changes, which in this case are associated with different selectivities and activities22. In the parent --- enzyme, the generated SPM identifies residues Y51 as well as V78 and A330, known from earlier studies40, to be important for enzyme activity and to be interconnected in terms of Cα correlated movements, thus highly contributing to the enzyme inactive-to-active conformational interconversion. This highlights why these three distal positions are found to be key during the evolutionary pathway for improving catalysis, in line with what we observed for the laboratory-evolved retro-aldolases22. Importantly, the SPM also describes strong connections between all the five-stranded β1 sheet with the β4-2 strand and the B’ helix, which we showed to be crucial for substrate binding and gating (Fig. 7d). This long-distance communicating pathway between β1 and β4 sheets directly relates the mutated positions on β1-2 strand (positions R47, T49 and Y51) and the increased flexibility of the β4 sheet. This shows how evolutionary pathways take advantage of networks of residue–residue interactions to fine tune the conformational dynamics along the evolutionary pathways for improving enzyme function.

Discussion

The identification of epistasis and its molecular mechanism are crucial for understanding protein function, but these are hardly ever explored in laboratory evolution studies of multiparametric optimization. For example, the directed evolution of activity and selectivity in plant sesquiterpene synthases59 or in P450BM3 (ref. 60) did not consider epistatic effects, while “stability-mediated epistatic effects” were observed in a P450BM3 study61 some years later5. Conversely, various studies in evolutionary biology have determined the contribution of multiple parameters on epistasis and organismal fitness. Shakhnovich et al., for instance, found that bacterial growth depends on the activity and expression of adenylate kinase62 or on the activity, binding and folding stability of dihydrofolate reductase63.

The construction of fitness landscapes using a single catalytic parameter has been reported in two main research areas. While such landscapes have revealed that usually many pathways are accessible in laboratory evolution of enzymes as catalysts in organic chemistry4,7,56,64, different conclusions have been made in evolutionary biology8,9,10,11. In the present study, we observed that only a few trajectories (2/6) are accessible to both selectivity and activity. Interestingly, the two accessible pathways for selectivity correspond to mutation Y51I. The addition of mutation R47I has almost no effect on activity and selectivity, while mutation T49I, which is closer to Y51I, significantly improves both parameters as it alters the enzyme conformational dynamics. T49I and Y51I enhance the flexibility of the β4 sheet, and both combined with R47I reshape the active site for enhanced 2β-hydroxylation. The triple mutant III excels in all parameters compared to all double and single mutants. Unexpectedly, upon going from the “parent” enzyme --- to mutant III, cooperative interactions at each step in the evolution of selectivity and activity (i.e. TTF) remain pervasive, with SE and ME characterizing those effects. Residues R47I, T49I and Y51I are located at the entrance of a long substrate channel far away from the active site45. Since the mutated residues were not observed to interact directly with the substrate in our MD simulations, we propose that the observed epistatic effects, which are mediated by long-range interactions, can occur via one main mechanism: direct effects between mutations but no direct interaction between the substrate and the mutations11.

Our computational exploration of the mutation-induced conformational changes on F87A variants provide key insights concerning the importance for P450BM3 evolution towards more active and selective variants. These simulations predict that activity is dictated by the flexibility of the β4 sheet, which acts as a gate and modulates substrate access to the catalytic haem pocket for efficient hydroxylation. Our simulations also highlight the key role of the F–G loop in open–close conformational transitions involved in substrate binding, as shown for P450BM3 by Shaik58 and for P450PikC by Houk and Sherman65. The substrate-binding simulations show that selectivity is dictated by how the substrate is oriented when accessing the haem pocket through the β4 sheet. By tuning the open/close conformational states of the F–G loop and the β1 hairpin, the substrate access channels are altered, which impact substrate orientation and thus selectivity. This rich conformational heterogeneity observed for P450BM3, which is important for substrate binding, is in line with previous reports58 and also with the selective stabilization of discrete conformational states of P450CYP119 and P450PikC upon ligand binding65,66. However, our simulations contrast to what was previously observed in P450cam, which does not depend on open/closed conformational changes of the F, G helices and loop for allowing substrate binding49. It should also be mentioned that P450cam complexed with its redox partner adopts an open conformation that stabilizes the active site key for the proton relay network67.

Using SPM analysis, the most important positions that participate in the open/closed conformational conversions that dictate selectivity and activity were predicted. Of relevance is that the key residue Y51 found to be essential for both activity and selectivity in this study is contained in the SPM path, as well as the previously described V78 and A330 positions40. SPM also highlights a long-distance communicating pathway between β4 and β1 where positions R47, T49 and Y51 are located, which is exploited along the evolutionary pathway for altering BM3 protein function.

This study provides evidence that in P450BM3 epistasis is intrinsically linked to conformational dynamics, which fine-tunes multiple functions in a protein involved in secondary metabolism. Our findings on the conformational changes connected to CYP activity and selectivity and residue networks that modulate such conformational conversions can be expected to facilitate future rational evolution of these enzymes for diverse practical applications.

Methods

Chemicals, materials and software

All commercial chemicals were purchased with the highest purity grade (e.g., high-performance liquid chromatography (HPLC)) from Sigma-Aldrich (St. Louis, US) unless otherwise indicated. For protein purification, lysozyme and DNase I was purchased from Applichem (Darmstadt, Germany). For PCRs, KOD Hot-Start DNA Polymerase was obtained from Novagen (Merck, Darmstadt, Germany). Restriction enzyme DpnI was bought from New England Biolabs (Ipswich, US). The E. coli BL21-Gold(DE3) strain, obtained from Novagen (Merck-Millipore) and generally cultured in lysogeny broth (LB) with 50 µg/mL kanamycin (Kan50) as marker (LBKan50), both obtained from Carl Roth, was used for transformation of site-directed mutagenesis reactions as well as for protein overexpression experiments. According to standard molecular biology protocols, electro-competent E. coli cells were prepared using 10% glycerol (Applichem) and transformed with the corresponding plasmids using a “MicroPulser” electroporator (BioRad, Hercules, US) following the manufacturer’s instructions. Oligonucleotides were purchased from Metabion (Martinsried, Germany). The analysis of sequencing reads was performed using the commercial software MegAlign from DNASTAR Lasergene version 11 (Madison, US) and the freeware ApE plasmid editor version 2.0.44 by Wayne Davis. The software used for constructing the fitness pathway landscapes is Surfer version 8 (Golden, US) and the graphs and dot plots were done with GraphPad Prism version 9 (La Jolla, US).

Site-directed mutagenesis

The mutants were created using the MegaPrimer method as reported in ref. 40. Briefly, the P450BM3 mutant F87A gene, already cloned in the pETM11-BM3 plasmid40, was amplified by PCR by using <25 ng template with 2.5 µM of both silent and mutagenic oligos depending on mutant (Supplementary Table 12) in 50 µL of 1× KOD hot start buffer, 2 mM dNTPs (each), 25 mM MgSO4, and 0.5 units of KOD hot start polymerase. The PCR programme started with 1 cycle of 95 °C for 3 min, 5 cycles of 95 °C for 30 s, 62 °C for 1 min, 72 °C for 6 min, 20 cycles of 95 °C for 3 min, 68 °C for 8.5 min, 1 cycle of 68 °C for 10 min and cooling. The samples were treated with 1 µL DpnI and incubated at 37 °C overnight to remove the parent plasmid. For each mutant, 5 colonies were incubated in 4 mL LB and the plasmids were extracted using the commercial kit QIAprep Spin Miniprep Kit from QIAGEN (Hildesheim, Germany). DNA sequencing was conducted with the four respective oligos listed in Supplementary Table 12 by service provider GATC (now Eurofins, Constance, Germany).

P450BM3-based oxidation reactions using purified enzymes

Biotransformation reactions were performed as follows38. Briefly, reactions were performed in 2.2 mL microtitre plate (MTP) format by resuspending the thawed cells in 600 µL of reaction mixture, followed by addition of 6 µL testosterone [stock: 100 mM (dimethylformamide (DMF)); final conc. 1 mM (1%)], plate sealing with Breathe Easier sealing membranes (Sigma) and incubation in an orbital shaker with tray with holders for 96-well plates (Multitron, Infors HT, Switzerland) at 220 rpm and 25 °C for 24 h. The reaction mixture consisted of 100 mM KPi buffer pH 8.0, 100 mM glucose (Applichem), 10% glycerol (Applichem), 1 mM NADP+ (Merck-Millipore or Applichem), 1 U/mL glucose dehydrogenase (GDH-105) obtained from Codexis (Redwood City, US), 5 mM EDTA and 50 µg/mL kanamycin. The reaction was stopped by adding 350 µL of ethyl acetate using a Tecan robotic system (Männedorf, Switzerland) equipped with a liquid handling arm (LiHA), which was controlled using the Gemini software V3.50, followed by centrifugation (10 min, 1100 × g, 4 °C). The organic phase was extracted using the same robotic system but with the multi-pipette option (Te-MO), transferred to 500 µL MTPs (Nunc, Roskilde, Denmark) and left unsealed for evaporation in the fume hood overnight. The dried samples were resuspended in 150 µL acetonitrile and passed through a PTSF 96-well plate filter to remove solid particles (Pall, VWR, Germany) into a new 500 µL MTP (Nunc). The MTPs, which were closed using silicon lids for the corresponding plates, were stored at 4 °C prior to screening.

Steroid hydroxylation screening by HPLC

A LC-2010 HPLC system (Shimadzu, Japan) equipped with four MTP racks was used employing a reverse-phase “250 Eclipse XDB” C18 column of 250 mm (1.8 µM size particle) together with a corresponding pre-column bought from Agilent (Waldbronn, Germany) as stationary phase and installed in the oven at 40 °C. The mobile phase was composed of a mixture of high-purity water generated from the local deionized water supply using a TKA MicroLab water purification system, acetonitrile (CH3CN) and methanol (MeOH). For testosterone (1), a programme of 8 min based on a CH3CN:MeOH:H2O mixture was used: 0 → 3 min (15:15:70), 3 → 5 min (20:20:60), 5 → 6 min (30:30:40), 6 → 7 min (15:15:70). This protocol allows the separation of >14 oxidation products of 1. The retention times of the known and unknown compounds can be found elsewhere38. Data acquisition was done using the Shimadzu LCsolution software version 3, while data analysis was performed with Microsoft Excel 365 MSO version 2012 (16.0.13530.20054) 32-bit.

Large-scale protein expression and purification

The P450BM3 mutants were inoculated into 4 mL LBKan50 broth and cultured overnight in the orbital shaker with tray with adhesive matting for shake flasks (Multitron) at 37 °C and 220 rpm. The overnight culture (4 mL) was transferred into 200 mL TBKan50 in 500 mL shaking flasks. The cultivation continued at 37 °C and 220 rpm for 2–3 h until the OD600 reached ~0.6–0.8, then IPTG was added to a final concentration of 100 µM and the temperature was reduced to 25 °C. After 20 h expression, the cells were harvested by centrifugation at 1100 × g and 4 °C for 15 min. The cell pellets were stored at −80 °C until further processing. The cell pellets were dissolved in buffer (50 mM KPi, 800 mM NaCl, pH 7.5) and disrupted by sonication under an ice bath. The collected lysate was centrifuged for 45 min at 30,000 × g at 4 °C and the obtained brownish-red supernatant was filtered to sterility with a 0.45-µm filter. The lysate obtained was loaded onto the pre-equilibrated nickel affinity column (HisTrap FF, 5 mL, GE Healthcare) with loading buffer (50 mM KPi, 800 mM NaCl, pH 7.5, 2 mM L-histidine). The column was first washed with 10 column volumes loading buffer, followed by gradient elution using an L-Histidine buffer (50 mM KPi, 80 mM L-histidine, pH 7.5) until complete protein elution. Columns were stripped and recharged between each mutant to avoid cross contamination. A flow rate of 5 mL/min was used and all fractions showing adsorption at 417 nm were collected. Proteins from the flow through were pooled and the buffer was exchanged to 25 mM KPi (pH 7.5) by ultrafiltration using a 50 kDa Amicon Ultra centrifugal filter (Merck-Millipore) and then concentrated to 5 mL. To remove the bound endogenous fatty acid, gravity-flow protein purification with Lipidex 1000 (Perkin Elmer) chromatography was conducted. Ten millilitres of Lipidex resin stored in methanol was used for column packing, which was subsequently washed with 10 column volumes of water and 10 column volumes of buffer (25 mM KPi, pH 7.5). After that, the protein was applied onto the column. The column was then capped to leave the protein in contact with resin at room temperature for 1 h, allowing hydrophobic compounds to bind to the resin. The protein was completely eluted from the Lipidex resin with buffer (25 mM KPi, pH 7.5) and the column was cleaned with at least 10 column volumes of methanol. The purified protein was pooled, and the buffer was exchanged to 100 mM KPi (pH 8.0) by ultrafiltration using a 50 kDa Amicon Ultra centrifugal filter, and then concentrated to 1 mL and stored at −80 °C for further use. An aliquot was thawed at room temperature and enzyme concentration was determined by CO difference spectrum analysis prior to usage. The enzyme concentration determined for all intermediate mutants is shown in Supplementary Table 1.

Determination of kinetic parameters using isolated enzymes

The kinetic experiments were performed using a JASCO V‐650 spectrophotometer (JASCO International CO., LTD, Japan) equipped with a PAC‐743 Peltier temperature control unit and UV‐Vis‐NIR Spectra Manager software II. All assays were performed in 100 mM potassium phosphate buffer (pH 8.0) at 25 °C using quartz cuvettes adapted for magnetic stirring (900 rpm). NADPH consumption was determined by measuring NADPH depletion monitored at 340 nm (ε = 6.22 mM1 cm1). A concentration of 0.24 mM NADPH was used in the reaction mixture. Due to uncoupling reactions, where NADPH is consumed without substrate hydroxylation, the rates were calculated by subtracting the rate of NADPH consumption in the absence of substrate. Reactions containing 0.2 mM testosterone dissolved in DMF with a final solvent concentration of 1% (v/v) were started with addition of 100 nM P450BM3 enzyme in a final volume of 1 mL and these were monitored until NADPH was completely depleted, as measured by no change in absorbance at 340 nm (completion of the reaction). Afterwards, the reaction mixture was immediately transferred into 96 MTPs and frozen at −20 °C. Reaction mixtures of 600 µL were taken and mixed with ethyl acetate (2 × 150 μL) with the LiHA of the Tecan robot platform (dispensing speed, 600 µL/s) The organic phase was extracted using the Te-MO multi-pipette option and transferred to 500 µL MTPs (Nunc, Roskilde, Denmark). The solvent was dried overnight, and the next day the steroid was resuspended in 150 µL acetonitrile and passed through a PTSF 96-well plate filter to remove particles (Pall, VWR, Germany) into a new 500 µL MTP (Nunc). The MTPs were stored at 4 °C prior to screening. The kinetic parameters are shown in Supplementary Table 1.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.