A promiscuous cytochrome P450 aromatic O-demethylase for lignin bioconversion

Microbial aromatic catabolism offers a promising approach to convert lignin, a vast source of renewable carbon, into useful products. Aryl-O-demethylation is an essential biochemical reaction to ultimately catabolize coniferyl and sinapyl lignin-derived aromatic compounds, and is often a key bottleneck for both native and engineered bioconversion pathways. Here, we report the comprehensive characterization of a promiscuous P450 aryl-O-demethylase, consisting of a cytochrome P450 protein from the family CYP255A (GcoA) and a three-domain reductase (GcoB) that together represent a new two-component P450 class. Though originally described as converting guaiacol to catechol, we show that this system efficiently demethylates both guaiacol and an unexpectedly wide variety of lignin-relevant monomers. Structural, biochemical, and computational studies of this novel two-component system elucidate the mechanism of its broad substrate specificity, presenting it as a new tool for a critical step in biological lignin conversion.

L ignin is a heterogeneous, aromatic biopolymer found in abundance in plant cell walls where it is used for defense, structure, and nutrient and water transport 1 . Given its prevalence in plant tissues, lignin is the largest reservoir of renewable, aromatic carbon found in nature. The ubiquitous availability of lignin in the environment, coupled to its inherent structural heterogeneity and complexity, has led to the evolution of microbial strategies to break lignin polymers down to smaller fragments using powerful oxidative enzymes secreted by rot fungi and some bacteria [2][3][4] . These lignin oligomers can be further assimilated as carbon and energy sources, through at least four known catabolic paradigms 5 .
The most well understood aromatic catabolic mechanism, mainly studied in aerobic soil bacteria, relies on the use of nonheme iron-dependent dioxygenases to oxidatively ring-open structurally diverse, lignin-derived aromatic compounds 5,6 . These dioxygenases act on central intermediate substrates, such as catechol, protocatechuate, and gallate, either in an intra-or extradiol manner. Lignin is primarily based on coniferyl (G) and sinapyl (S) alcohol subunits, which exhibit one or two methoxy groups on the aromatic ring, respectively. Nearly all ligninderived compounds must therefore be O-demethylated to diols before they can be oxidatively cleaved to generate ring-opened compounds, which are ultimately routed to central carbon metabolism (Fig. 1) 7 . More recently, the same aromatic-catabolic pathways have been invoked as a potential means to convert lignin to useful products in biorefineries 4,[7][8][9][10][11] . O-demethylation is therefore a critical reaction for assimilating lignin-derived carbon in both natural carbon cycling as well as in emerging biotechnology applications.
Earlier reports from Eltis et al. and Bell et al. described cytochrome P450-based demethylation of aromatic compounds; though either the full gene sequences were not reported until recently 13,21 , or the para-substituted substrate (4-methoxybenzoate) was of limited interest for the lignin degradation problem 22,23 . Similarly, Dardas et al. found evidence of a P450 in Moraxella GU2 responsible for the O-demethylation of guaiacol and guaethol; however, neither the gene sequence nor identity of the P450 or its reductase partner was isolated 24 .
The relatively narrow substrate specificities elucidated to date for aryl-O-demethylation, coupled to the potentially broad distribution of structurally distinct, methoxylated lignin products found in nature, prompted us to search for alternative mechanisms for this key reaction. Because G-unit monomers constitute a majority of plant-derived lignin, we initially focused on Odemethylation of guaiacol (2-methoxyphenol), which in turn represents the simplest G-unit monomer derivable from lignin. As reported in a companion study 21 , we isolated a cytochrome P450-reductase gene pair, gcoAB, from Amycolatopsis sp. ATCC 39116 (encoding proteins with accession numbers WP_020419855.1 and WP_020419854.1). Introduction of this pair via plasmid-based expression into Pseudomonas putida KT2440, a robust aromatic-catabolic bacterium, was sufficient to confer growth on guaiacol 21 . Here, we report a comprehensive structural, biochemical, and computational description of this new cytochrome P450-based mechanism for aryl-O-demethylation. Unlike other known tetrahydrofolate-or non-heme irondependent demethylases, which are fairly substrate specific, the P450-reductase pair characterized here (GcoAB) demethylates diverse aromatic substrates, potentially providing an important advantage in both natural and biotechnological contexts. The results presented here suggest a remarkably flexible active site that may promote promiscuous substrate usage.

Results
GcoA crystal structures suggest broad substrate specificity. The X-ray crystal structures of GcoA (in complex with guaiacol) and GcoB were determined to resolutions of 1.4 Å and 1.7 Å, respectively (Figs. 2 and 3, Supplementary Figs. 1-7, and Supplementary Table 2). The GcoA structure reveals a typical P450 single-domain architecture with a central heme adjacent to a buried active site, captured with the substrate access loop in the closed position (Fig. 2a, b). GcoA possesses a broadly hydrophobic pocket with the two oxygen atoms of the substrate coordinated by backbone carbonyl and amide nitrogen groups from residues Val241 and Gly245, respectively. A series of hydrophobic amino acids is responsible for positioning the aromatic ring, including a triad of phenylalanine residues lining the active site   (Fig. 2c). The guaiacol methoxy carbon is positioned at a distance of 3.92 Å from the center of the heme iron, with the plane of the heme appearing to exhibit a slightly domed architecture.
Crystallization screening for a range of guaiacol-analog-bound complexes produced three additional well-diffracting co-crystals with guaethol, vanillin, and syringol, where each ligand was fully occupied in the active site ( Fig. 2d-f). In each structure, a variety of alternative means to accommodate the additional aryl side chains were observed in the active site. Guaethol, the most similar chemical structure to guaiacol, is easily accommodated with a small translation of the ring and without any rearrangement of the active site around the additional methyl group. Vanillin, being bulkier with an aldehyde group at the 4-position, induces a significant reorientation of Thr296, forming a new H-bond between the residue and the substrate. This results in the twisting of the carboxylate group on the adjacent heme propionate, eliminating the H-bond with Thr296 observed in the guaiacolbound structure, and instead forming an alternative H-bond with a water molecule in between the two heme propionates. In contrast, syringol is accommodated by a combination of a rotation in the plane of the aromatic ring about the hydroxyl group relative to guaiacol, and an expansion of the hydrophobic residues lining the active site cavity.
GcoB has an unusual three-domain structure. Most bacterial cytochrome P450s require two partner proteins, usually a cytochrome P450 reductase or a ferredoxin and ferredoxin reductase, that transfer electrons from a carrier such as NAD(P)H to the cytochrome 25 . Analysis of the GcoB sequence, however, suggests it contains all of the necessary domains in a single polypeptide, with an N-terminal 2Fe-2S ferredoxin domain followed by an FAD and NAD(P) binding region with homology to ferredoxin-NADPH reductase (FNR) type oxido-reductases.
Structurally, the compact, N-terminal 2Fe-2S domain of GcoB (Fig. 3a, b) bears strong homology to putidaredoxin (Pdx), a ferredoxin that transfers electrons to P450cam in the threecomponent camphor hydroxylase system from P. putida (Supplementary Fig. 6-8). The C-terminal region consists of an FAD-binding domain, containing 6 beta-strands and a single alpha helix, followed by an NADH-binding domain. These Cterminal domains show structural homology to FAD-type cytochrome P450 reductase (CPR 26 ) domains in which the Cterminal portion primarily stabilizes the isoalloxazine moiety of FAD while the N-terminal domain coordinates the diphosphate bridge between the flavin and adenosine groups (Fig. 3c, d). A structural comparison of the NADH-binding domain with the NADPH binding domain from a related CPR is strongly predictive of a binding preference for NADH over NADPH. While the individual domains are structurally similar to other CPR proteins, the overall domain architecture is not, and instead it is highly conserved with reductase proteins such as BenC 27 , the benzoate 1,2-dioxygenase reductase from A. baylyi ADP1, which supplies electrons to Rieske-type aromatic ring-hydroxylating dioxygenases. This suggests an unexpected convergence in the organization of the reductase partners for these evolutionarily distinct oxyenases.
GcoA and GcoB form a dimer complex in solution. GcoA and GcoB operate as a multi-domain complex; guaiacol is processed exclusively in GcoA, with electrons supplied by the GcoB redox machinery. When expressed and purified individually, each protein was shown to be monomeric in solution using a combination of hydrodynamic methods (Supplementary Table 3). Sizeexclusion chromatography revealed a strong interaction between GcoA and GcoB ( Supplementary Fig. 9) which was confirmed by analytical ultracentrifugation, indicating that GcoAB is a heterodimer in solution. GcoA has a characteristic basic pocket on the proximal face, previously identified in other P450 systems [28][29][30] as the docking surface for the associated reductase partner ( Supplementary Fig. 10). Similarly, the surface of GcoB is predicted to have an acidic patch that interfaces with the  30 . From the GcoB crystal structure and surface charge representation, and comparison to ferredoxins homologous to GcoB 21 , it is likely that the interacting face of GcoB is buried at the interface between the ferredoxin domain and the FAD binding domain in the reported crystal structure. Drawing parallels from Huang et al. 31 , a conformational change of GcoB likely precedes binding with GcoA. A possible schematic for this process is shown in Supplementary  Fig. 11. While the organization of P450 systems is diverse 32 , the architecture of this 2-protein system, combining a catalytic P450 with a 2Fe-2S ferredoxin domain in addition to FAD and NAD binding sites, represents a new class. The linear domain diagram shown in Fig. 3e allows comparison to standard class descriptions as reviewed by Guengerich and Munro, and illustrates the distinct nature of GcoAB 33 .
The reductase activity of GcoB was monitored using NADH or NADPH in a cytochrome c reduction assay   35 . In agreement with our comparative structural analysis ( Supplementary Fig. 7), the reaction with NADH was over 50-fold faster than with NADPH (k cat = 44 ± 1 s −1 , K M = 16 ± 0.2 μM, 25°C, pH 7.5; all activity measurements referenced per active cofactor). Demethylation of guaiacol, which binds tightly to GcoA (K D = 6 nM; Table 1) was then monitored over time via NADH disappearance at 340 nm under steady-state conditions (Fig. 1b). The resulting k cat (6.8 ± 0.5 s −1 ) was approximately sixfold less than the k cat for the GcoB reduction reaction alone, suggesting that the overall rate of the two-enzyme catalytic cycle is limited by steps involving GcoA. The value for k cat is similar to or greater in magnitude than k cat for other known, non-P450-type O-aryl-demethylases acting upon their preferred substrates, including LigM (5.8 ± 0.25 s −1 ), LigX (6.1 ± 0.2 s −1 ), or PODA (0.034 s −1 ) (Supplementary Table 4).  consumed, as monitored by UV/vis or fluorescence quenching (vanillin reaction) d Standard deviations are representative of three or more independent measurements e Guaethol is de-ethylated by GcoAB, forming catechol and acetaldehyde instead of formaldehyde f Vanillin and syringol are the only partial substrates/partial uncouplers of those listed. As such, the Michaelis-Menten parameters were not measured using the described NADH oxidation assay Ten structurally diverse O-methylated aromatic compounds were screened for activity with GcoA. Seven of these induced measurable NADH consumption (Fig. 4) and K D values spanning 70 nM to 37 μM (Table 1). All nonetheless yielded K M (O-methylaryl) values comparable to K M (guaiacol), indicating that all form catalytically productive complexes with GcoA. Only three (vanillate, ferulate, and veratrole) had no detectable binding or catalytic interaction with GcoA.
We next examined whether NADH consumption was coupled to substrate demethylation. Aldehyde and the demethylated aromatic products were quantified and compared to the total concentration of NADH consumed following quenching of a reaction containing saturating amounts of each substrate (Table 1, Supplementary Fig. 19). For all substrates tested except guaethol, in which acetaldehyde is produced, formaldehyde was the expected product. Five of the seven substrates that produced aldehyde (guaiacol, guaethol, 3-methoxy-catechol, anisole, and 2methyl-anisole) did so in a~1:1 ratio with NADH consumed (Fig. 4). Syringol and vanillin stimulated NADH turnover but produced less than stoichiometric amounts of formaldehyde. We hypothesized that these two substrates bind in the active site in the same location as guaiacol, displacing water and stimulating the spin-state change that permits reduction of the heme iron 22 . However, the reaction with O 2 is uncoupled to substrate demethylation in some proportion of turnovers, likely leading to H 2 O 2 release. For all 7 substrates, the expected hydroxylatedaryl product was detected and its identity confirmed by matching its HPLC retention time with known standards. However, we observed variable stability of these hydroxylated compounds in air; hence, the quantity of aldehyde product is likely a better indicator of the extent to which NADH consumption is coupled to aldehyde production.
Examining the structures of the 10 tested substrates in light of the crystallographic data suggests an emerging structure-activity relationship ( Supplementary Fig. 5). First, the efficiency of guaethol as a substrate shows that the enzyme is capable not only of accommodating the larger ethoxy group (Fig. 2d) but also of catalyzing de-ethylation. Second, there appears to be a varying degree of flexibility in the permissible substituents around the aryl ring. C1 (guaiacol numbering, Fig. 4) can accommodate -OH (guaiacol, guaethol, 3-methoxy-catechol), -CH 3 (2-methyl-anisole), or -H (anisole) with comparable efficiency constants [k cat / K M (O-methyl-aryl)]. However, veratrole, which has -OCH 3 at this position, is a non-substrate/non-binder, suggesting that steric constraints supplied by the nearby α-helix ( Fig. 2c) might limit the size of substituents here. Similarly, the carbon ortho to the -OH of guaiacol can have either -H or -OH (3-methoxy-catechol) substituents without substantial penalty to the efficiency constant, though -OCH 3 (syringol) again leads to a partial substrate, partial uncoupler. Finally, substituting the -H at the C4 guaiacol position, which is closest to the side chain of Thr296, with a formyl group (vanillin) likewise yields a partial substrate, partial uncoupler; however, carboxylic acid (vanillate and ferulate) substituents preclude binding and/or demethylation. The partial demethylation observed for syringol and vanillin, despite the fact that both assume a guaiacol-like binding mode in the crystal structures ( Fig. 2), suggests that dynamic factors may be important for understanding the efficiency of the GcoAsubstrate interaction.
Enzyme opening and closing are key steps in GcoA catalysis. Some P450 enzymes are known to undergo conformational opening and closing motions during their catalytic cycles 37 . While the structures of GcoA obtained in this study were crystallized in an apparent closed state, molecular dynamics (MD) simulations indicated that an in silico-generated apo form of GcoA can spontaneously open on the time scale of 1 μs, with the structural changes occurring mainly in the F and G helices ( Fig. 5a We subsequently employed umbrella sampling (US) to obtain the free energy profile for the opening-closing motion of the Local changes in the active site are closely related to the opening-closing motions of GcoA. Simulations show that residues Phe75, Phe169, and Phe395, which surround the aryl ring of the substrate in the active site as seen in Fig. 2c, undergo breathing motions as a function of the GcoA opening-closing movement. Larger deviations in the positions of the Phe residues are associated with open states of GcoA, as seen in Fig. 5d, which shows the correlation between the open-closed motions in GcoA:apo, as measured by the reaction coordinate computed along simulations, and breathing motions of residues Phe75, Phe169, and Phe395 ( Fig. 5e-g), measured by their RMSD relative to the crystal structure (details of the Phe aromatic center distances are presented in Supplementary Fig.  23). The presence of guaiacol or catechol reduces the breathing motions of the Phe residues ( Supplementary Fig. 21), as their RMSD never reaches values >3 Å, in contrast to the RMSD in GcoA:apo, which can reach values greater than 4 Å. In the configuration shown in Fig. 5e, which occurs in the crystal structures and corresponds to an RMSD <2 Å, the three Phe residues are close to each other and interact directly with guaiacol. This configuration tends to occur when GcoA is closed in the simulations (Fig. 5d, Supplementary Fig. 21). In the configuration in Fig. 5f, which corresponds to the 2-3 Å RMSD range and occurs when GcoA is partially open (with the reaction coordinate around 0), the side chain of Phe169 deviates from its crystallographic position, but the three Phe residues still interact with the substrate and exclude water from the active site. In the configuration in Fig. 5g, which shows the most open state of GcoA:apo (when the reaction coordinate assumes values less than 0 and at an RMSD value >3 Å), the Phe side chains move even further apart from each other, expanding the binding site and  Fig. 21) and help explain the structure-activity relationships in this enzyme. GcoA:guaethol exhibits less flexibility than GcoA:guaiacol, as observed by the limited opening-closing motions and binding site expansion. Given an almost fivefold reduction in k cat for guaethol compared to guaiacol, it appears that such flexibility is required for optimal substrate turnover. GcoA:syringol and GcoA:vanillin, on the other hand, are both more flexible than GcoA:guaiacol and more prone to opening-closing transitions and expansion of the active site. This indicates that syringol and vanillin, which bind to the active site, stimulate NADH turnover, but are not stoichiometrically demethylated (Table 1), are less effective in maintaining the enzyme in the closed state than guaiacol and guaethol. This suggests that successful protein engineering for alternative substrates will require careful consideration to balance conformational flexibility with productive binding and catalysis, and these data provide a route to help define the optimum window.
Proposed reaction mechanism for guaiacol O-demethylation. Having identified that GcoA contributes the rate limiting step in this 2-protein system, density functional theory (DFT) calculations were used to investigate the mechanism for guaiacol Odemethylation. The putative enzymatic reaction is shown (Fig. 6) with guaiacol as the modeled substrate. DFT calculations using a truncated model system identified two possible reaction pathways (path A and path B) that GcoA could catalyze, which rely to two different approaches of the guaiacol substrate to the Fe = O active species (Fig. 6) (see Supplementary Table 5  The main difference between the two rate-limiting transition states (TSs) is the conformation that the guaiacol substrate adopts with respect to the Fe = O active species. TS1-a 4 corresponds to the HAT transition state when the substrate/Fe = O orientation is similar to the one observed in the substrate-bound crystal structure (see Figs. 2c and 6b); TS1-b 2 is the lower energy HAT TS (1.5 kcal/mol lower than TS1-a 4 ). In this case, the substrate orientation allows the guaiacol hydroxyl group to interact by Hbond with the Fe = O, stabilizing the TS but also permitting the second OH H-abstraction (Fig. 6). The direct comparison of the two rate-limiting TSs proved the intrinsic preference of the substrate to react following path B over path A. Nevertheless, the strong preference of the substrate to bind in the specific orientation found in the crystal structure as observed during the course of the MD simulations ( Fig. 5f and Supplementary  Fig. 27), indicates that path A will be followed although it is energetically less favorable.
Alternatively, open-shell singlet biradical intermediate 4 in path B could form the less stable zwitterionic closed-shell electronic configuration (1.6 kcal/mol higher in energy than the biradical) and further react with a water molecule to generate the hemiacetal 3. The absence of water molecules in the active site environment when guaiacol is bound as observed from our MD simulations ( Supplementary Fig. 28) argues against this possibility.

Discussion
Recent efforts from multiple groups have attempted to harness aromatic catabolism for productively utilizing lignin [8][9][10]18,[39][40][41][42][43][44] . As a single microbe is unlikely to have the full complement of necessary catabolic enzymes for lignin bioconversion, a key component of such synthetic biology strategies is the introduction of foreign catabolic genes to expand substrate specificities of the host microbe. Bacterial enzymes that catalyze the demethylation of lignin-derived aryl-methoxy substrates are of particular interest, as the demethylation reaction presents a bottleneck for the conversion of lignin into desirable products. Currently, Rieske non-heme iron monooxygenases 14

ARTICLE
From a metabolic engineering standpoint, the GcoAB system offers a number of potential advantages. First, the native substrate of GcoA, guaiacol, is a major breakdown product of plant lignin. Demethylation of guaiacol yields catechol, which can be ringopened via either intra-or extra-diol cleavage catechol dioxygenases. Second, compared to other known O-aryl-demethylases, the substrate preferences of GcoA are intrinsically broad, admitting a variety of guaiacol analogs that are also known lignin breakdown products. Third, we anticipate a P450 system to be amenable to further tuning using directed evolution techniques 45 . A prior report of a closely related cytochrome P450 that can demethylate 4-methoxybenzoate 12 suggests that the GcoA active site may be modified to admit larger, more hydrophilic, ligninderived substrates such as ferulate or vanillate. Indeed, genes encoding putative homologs to the two-component GcoA and GcoB system described here are predicted in the genomes of several bacterial species belonging to the genera Rhodococcus, Streptomyces, and Gordonia, among others, and the substrate preferences for this diverse group remain unclear, but offer a promising platform for further exploration and engineering. Moreover, work from Bell et al. revealed an unrelated Rhodopseudomonad cytochrome P450 can also demethylate 4methoxybenzoate and be productively engineered to accommodate 4-ethylbenzoate. While retaining a classical P450 fold, this CYP199A4 system exhibits an alternative binding mode in terms of both substrate positioning relative to the heme, and steric selectivity with an alternative set of aromatic residues lining the active site pocket, further demonstrating the diversity within this class of enzymes. Fourth, a heme-based P450 may offer a simpler alternative for aromatic demethylation compared to tetrahydrofolate-dependent O-demethylases, given the relative ubiquity of P450s and robust heme biosynthetic pathways in potential bacterial hosts. Finally, distinct from most P450 systems, the GcoB reductase is encoded as a single polypeptide rather than two.
Close examination of the GcoA-guaiacol active site shows that substrate binding involves interactions between the peptide backbone and the substrate hydroxyl (ring C1) and methoxy groups (C2). The ring C3 position has a relatively close (3.8 Å) interaction with the porphyrin-γ-meso carbon that bridges the propionate-substituted pyrrole rings. However, the remaining ring positions are not directly occluded by backbone/porphyrin atoms. Reactivity studies showed that guaiacol analogs with substitutions at C4, C5, and C6 remained substrates with comparable efficiencies (k cat /K M ) to guaiacol itself. Even substitutions at C1 and C2 (ethoxy-for methoxy-) are permitted. Comparison of the structures of the guaiacol, guaethol, vanillin, and syringol ligand-bound GcoA structures shows that all assume a similar binding mode with only subtle reorganization of the surrounding active site residues.
MD simulations suggest that the active site opens and closes in response to substrate binding. This flexibility in the active site, in which several side chains (e.g., Phe75, Phe169, and Phe395) reorganize to accommodate the bound ligand, may be partly responsible for the observed substrate promiscuity of GcoA. Though the active site is flexible, potential substrates must be able to maintain the closed state of the active site in order to prevent the uncoupling of NADH oxidation from substrate hydroxylation. The same simulations also suggest that the active site constrains the binding mode of guaiacol, so that the methoxy group points toward and the hydroxyl group is oriented away from the heme iron. This may forestall the lowerenergy cyclization reaction pathway predicted for guaiacol by DFT calculations.
Together, the structural, biochemical, and computational data presented here suggest a GcoA active site that is sufficiently accommodating to turnover a range of substrates that each react in the desired fashion to release an aldehyde product. We hypothesize that the substrate range and consequently the utility of GcoA may be extended even further, to accommodate important G-and S-type lignin subunits, by protein engineering or directed evolution. Tests of this hypothesis are the subject of ongoing work.
Protein expression and purification. The genes for gcoA and gcoB were amplified from Amycolatopsis sp. ATCC 39116 genomic DNA and assembled separately using NEBuilder® HiFi DNA Assembly Master Mix (New England Biolabs) into the pGEX-6P-1 vector (GE Lifesciences), which codes for an N-terminal glutathione-S-transferase (GST) fusion tag. Oligonucleotide primer sequences are provided in Supplementary Table 1. Expression constructs were expressed in E. coli Rosetta™ 2 (DE3) cells (Novagen). Cells were transformed with plasmids containing either the GcoA or GcoB fusion construct and plated out on lysogeny broth (LB) agar containing chloramphenicol (34 mg/L) and carbenicillin (50 mg/L). A single colony was selected and used to inoculate a 20 mL starter culture of LB. After overnight growth at 37°C, 250 rpm, the starter culture was inoculated into 2.5 L flasks containing 1 L of either terrific broth (TB) (GcoA) or LB (GcoB) with antibiotics. At an OD 600 of 0.5 (GcoB) or 1.0 (GcoA), 0.2 mM IPTG was added to induce protein expression. Additionally, 100 mg/L 5-aminolevulinic acid (GcoA), or 200 mg/L ammonium iron(III) citrate (GcoB) was added to support productive cofactor incorporation. Induction of protein expression was performed for 16-18 h at 20°C with shaking at 250 rpm. Affinity purification was carried out using glutathione-sepharose 4B media (GE Lifesciences) followed by GST-tag cleavage with PreScission protease (GE Lifesciences). Anion exchange chromatography was performed using a MonoQ 5/50 GL column (GE Lifesciences) for GcoB with a 0-50% gradient of 50 mM HEPES, pH 7.5, 1 M NaCl, 1 mM DTT, and with a Source 30Q column (GE Lifesciences) with a 10-40% gradient of the same buffer for GcoA. For each protein, a final gel filtration step was performed using a HiLoad S200 16/60 pg column (GE Lifesciences) in a buffer containing 25 mM HEPES, pH 7.5 and 50 mM NaCl. Preparation of a GcoA SeMet derivative was achieved by expression in Selenomethionine Medium Complete (Molecular Dimensions) according to the manufacturers protocol and purification was performed using the same method as used for the native GcoA protein.
UV/vis spectroscopy of GcoA and GcoB. The spectra of GcoA and GcoB were measured in 25 mM HEPES, 50 mM NaCl at pH 7.5 using a Lambda 25 spectrophotometer (Perkin Elmer) over a wavelength range of 200-600 nm at 1 nm intervals in a quartz cuvette (Hellma Analytics).
Heme quantification. Catalytically active heme bound to GcoA was determined via a spectrophotometric/CO-binding assay 35 . CO gas was bubbled into a cuvette containing 0.94-2.5 μM GcoA (Pierce BCA assay). Excess sodium dithionite (~1 mg) was added to reduce the heme iron. The spectrum was recorded over a period of several minutes as a peak centered at~450 nm gradually appeared, attributed to the catalytically competent, ferrous CO-bound heme. A spectrum for a control containing only dithionite-reduced GcoA was measured, and a difference spectrum computed. Absorbances at 420, 450, and 490 nm were recorded to calculate the amount of active GcoA (P450) or inactive GcoA (P420 nm). The equations used to compute the concentrations of catalytically competent and inactive heme 35 are shown below. Reported values are the average of three or more measurements.
nmol of P450 per mL x À0:041 Here ΔA450 and ΔA420 are the differences between the reference and sample spectra at absorbances 450 and 420 nm, respectively.
The pyridine hemochrome assay was additionally used to assess the total heme content in GcoA 46 . Volume of 200 µL of a 6 µM GcoA solution was added to 797 µL 50 mM NaOH, 20% pyridine and 3 µL K 3 Fe(CN) 6 and a UV/vis spectrum measured. Excess (2-5 mg) sodium dithionite was used to reduce the heme and the absorbance at 556 nm was compared to the oxidized spectrum (ΔA). The Beer-Lambert law was used to calculate the amount of heme present, using ε 556 = 28.4 mM −1 cm −1 . Reported values are averaged from three or more measurements. An extinction coefficient for GcoA-bound heme, using the [GcoA-heme] determined from the CO binding assay, was estimated via the slope of a line relating absorbance at 417 nm (Soret peak) to [GcoA-heme].

Determination of [FAD] and non-heme [Fe] in GcoB. FAD was released from
GcoB by denaturing 200 μL of a protein (0.024 µM) solution with 5 μL saturated ammonium sulfate, pH 1.4 (7% v/v H 2 SO 4 ), similar to studies with related cytochrome P450s 47 . Precipitated protein was pelleted by centrifugation and the UV/vis spectrum of the FAD-containing supernatant was measured. The absorbance at 454 nm, ε FAD = 11.3 mM −1 cm −1 , and total protein concentration determined by the Bradford assay were used to determine [FAD] bound to GcoB. An extinction coefficient for GcoB-bound FAD was estimated via the slope of a line relating absorbance at 454 nm to [GcoB-FAD].
The Fe-S content of GcoB was assessed both by quantifying non-heme Fe(II) 48 and spectroscopically characterizing the cluster as a whole (below). GcoB was denatured as described above. Volume of 50 μL of supernatant was added to 25 μL of 5% w/v sodium ascorbate to reduce the iron. Volume of 100 μL of bathophenanthroline disulfonate (0.1% w/v in ddH 2 O) was added and the sample was incubated for 1 h. The resulting Fe(II) complex was quantified via its absorbance at 535 nm (ε 535 = 22.14 mM −1 cm −1 , determined using FeSO 4 standards). An extinction coefficient for GcoB-bound 2Fe-2S cluster was estimated via the slope of a line relating absorbance at 423 nm to [GcoB-2Fe-2S].
EPR and UV/vis spectroscopic characterization of FeS cluster. A 150 μM sample of GcoB was brought into an MBraun chamber and exchanged into anaerobic 50 mM Tris, 200 mM NaCl, 5% glycerol, pH 7.0. The sample was then reduced anaerobically with 10 mM sodium dithionite and loaded into an EPR tube. The tube was capped prior to removing from the chamber and frozen in liquid N 2 . X-band EPR spectra were recorded on a Bruker E-500 spectrometer equipped with a Super High Q (SHQ) resonator, in-cavity cryogen-free system (ColdEdge Technologies), and MercuryiTC temperature controller (Oxford). Spin quantifications were determined by comparison to copper standards at 75, 100, and 125 µM via double integration of the spectra after baseline subtraction using the Ori-ginPro software package. Spectral simulation was carried out with the EasySpin software package for g-value determination 49 . Fe X S Y clusters of different nuclearities have EPR spectra with distinct temperature dependencies. Based on the cluster type determined via EPR, the [Fe(II)] was found to be in close agreement to the results from the bathophenanthroline disulfonate assay described above. Thus, the colorimetric assay was used to determine all subsequent [2Fe-2S] and to find the ε 423 nm .
Reductase activity measurement of GcoB. GcoB activity was assessed using a continuous colorimetric assay involving cytochrome c as a colorimetric electron acceptor 35 . A total of 4.8 nM GcoB (referenced to [FAD]) and 42 µM cytochrome c (from equine heart) were dissolved in buffer (25 mM HEPES, 50 mM NaCl, pH 7.5), 25°C. 100 µM NADH was then added to initiate the NADH-and GcoBdependent reduction of cytochrome c. UV/vis spectra (Varian Cary 4000, Agilent) were recorded in the scanning kinetics mode. The increase in absorbance at 550 nm due to reduced cytochrome c was monitored over time, and the specific activity (nmol reduced cytochrome c min −1 nmol GcoB −1 ) calculated using: 35 ΔA 550 min À1 =0:021=mL reaction ¼ specific activity: For determining steady-state kinetic constants, the above protocol was used as a function of [NADH]. A total of 4.8 nM GcoB (referenced to [FAD]) and 43 µM cytochrome c were dissolved in buffer (25 mM HEPES, 50 mM NaCl, pH 7.5), 25°C , and the reaction was initiated with the addition of 2.5-200 µM NADH. Initial velocities (v i ) were determined from linear fits to the initial portion of the progress of reaction data, plotted as a function of [NADH], and fit to the Michaelis-Menten Eq. (5) using the KaleidaGraph software: Steady-state kinetics analysis of GcoAB. The demethylation of guaiacol and substrate analogs was continuously monitored under steady-state conditions. A total of 0.2 µM each of GcoA and GcoB were dissolved in air-saturated buffer (25 mM HEPES, 50 mM NaCl) in a cuvette at pH 7.5, 25°C. A total of 100 µg/mL catalase was added to each reaction to capture any H 2 O 2 formed during the uncoupled reaction. A saturating amount of NADH (≥5K M , 300 µM) was added and a background rate of NADH oxidation in air (>200 µM O 2 ) recorded via continuous scanning of the UV/vis spectrum. A total of 20-300 µM guaiacol (preferred substrate) or an alternate substrate from a 2.5 mM stock dissolved in DMSO was added and the reaction was monitored via measurement of UV/vis spectra for several minutes. The initial velocity was determined by disappearance of the characteristic NADH absorbance at 340 nm (ε 344 = 6.22 mM −1 cm −1 ). A plot of v i vs [guaiacol] was fit to Eq. (2) to obtain k cat , K M , and k cat /K M . For specific activity determination, the above method was used but with saturating (300 µM) guaiacol. The linear portion of [NADH] vs time was fit and referenced to the amount of GcoA used (0.2 µM). Reported values are the average of ≥3 measurements and reported errors are standard deviations. For vanillin, whose UV/vis spectrum overlaps with that of NADH, fluorescence was used to monitor NADH disappearance using a FluoroMax3 instrument (Horiba Jobin Yvon). A standard curve for NADH (0-350 µM) was generated by exciting the sample (25 mM HEPES, 50 mM NaCl pH 7.5) at 340 nm and monitoring the emission at 458 nm (vanillin did not excite or emit at this wavelength). The intensity vs [NADH] was plotted and fit to a 4-parameter logistic equation: where a is the theoretical response at [NADH] = 0, b is the slope of the curve at the inflection point, c is [NADH] at the inflection point, and d is the theoretical response at infinite [NADH]. Reactions with vanillin were performed in the same manner described above.
Determination of substrate dissociation constants with GcoA. A total of 0-60 µM of substrate analogs, in 0.25 or 0.5 µM aliquots, were titrated into a cuvette containing 1-6 µM GcoA in 25 mM HEPES, 50 mM NaCl, pH 7.5. The spectrum after each substrate addition was recorded, beginning with no substrate bound. The solution reached equilibrium before the next addition. A difference spectrum was made to illustrate the shift from a low-spin aquo-heme complex to the high-spin substrate-bound complex (spectral shift from 417 nm to 388 nm). The resulting difference spectra showed a peak at 388 nm, and a trough at 420 nm. The absorbance at 388 nm was plotted as a function of [substrate], yielding a quadratic curve that was fit to Eq. (7) to determine the K D .
Where L 0 , E t , K D , and ΔAbs max are the ligand concentrations, total protein (subunit) concentration, the equilibrium dissociation constant, and the maximum Abs 388-417 nm , respectively. Reported values are the average of 2 or more measurements.
Since the UV/vis spectra of vanillin and heme overlap, fluorescence quenching was monitored to determine the K D . 4 µM GcoA (25 mM HEPES, 50 mM NaCl, pH 7.5) was excited at 283 nm and the emission peak read at 340 nm; vanillin does not show the same excitation/emission pattern. Fluorescence emission intensity was followed upon titration of vanillin (0-350 µM) after equilibrium had been established. The % fluorescence quenched, Eq. (8), was plotted as a function of vanillin and fit to Eq. (7) above, replacing ΔAbs max with ΔFluorescence max .
Aldehyde product determination. For the quantification of formaldehyde (demethylation) or acetaldehyde (de-ethylation) production, the reaction was monitored via UV/vis or fluorescence (in the case of vanillin) using the same set of conditions as outlined above for specific activity determination. After eight minutes, aliquots of the sample were removed and carried onto the respective aldehyde detection assay. For reactions that produced formaldehyde (e.g., all substrates except guaethol), a colorimetric tryptophan assay was used, described previously 50  [Acetaldehyde] produced during the dealkylation of guaethol was also determined by using a colorimetric assay and a generated acetaldehyde standard curve. A kit from BioAssays was used. Briefly, 20 µL of the reacted sample was transferred to a 96-well plate and 80 µL of the working reagent, consisting of NAD/ MTT and aldehyde dehydrogenase, was added. The reaction was incubated for 30 min at room temperature and the absorbance read at 565 nm. Aldehyde dehydrogenase and NAD react with acetaldehyde to produce acetic acid and NADH. The NADH can then reduce MTT, resulting in the absorbance at 565 nm. Control reactions included samples without substrate and/or aldehyde dehydrogenase. Reported values are the average of ≥3 measurements. A DAD wavelength of 225 nm was used for analysis of the analytes of interest. In addition, 210 nm was used for protocatechuic, 4-hydroxybenzoic acid, guaiacol, syringol, veratrole, guaethol, vanillin, vanillic acid, catechol, 3-methoxycatechol, anisole, 2-methylanisole, dihydroxybenzaldehyde, phenol, and 2-methylphenol and 325 nm was used for p-coumaric acid, ferulic acid, caffeic acid, and pyrogallol. A minimum of 3-5 calibration levels was used with an r 2 coefficient of 0.995 or better for each analyte and a check calibration standard (CCS) was analyzed every 10 samples to insure the integrity of the initial calibration.
Crystallization and structure determination. Purified protein was buffer exchanged into 10 mM HEPES pH 7.5 and concentrated to A 280 values of 12 and 5 for GcoA and GcoB, respectively, as measured on a NanoDrop 2000 spectrophotometer (Thermo Fisher). Crystals of GcoA were grown with 1.8 M sodium malonate and 20 mM substrate. GcoB crystals were grown in the Morpheus Screen 0.09 M halogens mix (0.3 M sodium fluoride; 0.3 M sodium bromide; 0.3 M sodium iodide), 0.1 M buffer system 3 (1 M Tris (base); BICINE pH 8.5) and 60% v/v precipitant mix 1 (40% v/v PEG 500 MME; 20% w/v PEG 20000) (Molecular dimensions). Crystals in both crystallization conditions were successfully cryocooled directly in liquid N 2 without further addition of cryoprotectants. All data were collected at Diamond Light Source (Harwell, UK). The complex of GcoA with guaiacol was collected on I04 and the phases were solved with a single SAD dataset at a wavelength corresponding to the Se edge. Data from crystals of GcoA with guaethol, syringol, and vanillin were collected on I03 and the data from GcoB crystals were collected on I04. Data were processed with using the CCP4 51 and Phenix 52 suites and details are provided in Supplementary Methods. Data collection and refinement statistics can be found in Supplementary Table 2. Structural images were generated using PyMOL ((http://www.pymol.org) and surface charge calculated using DelPhi 53 .
Dynamic light scattering. Dynamic light scattering experiments were performed using a Protein Solutions DynaPro MSTC800 instrument operated through the Dynamics version 5.26.60 software package (Protein Solutions). Samples were passed through a 0.1 μm filter prior to measurement at 20°C. Results were taken from at least 20 measurements and the data were analyzed using the Dynamics software. The molecular weights of the proteins were estimated using the empirical equation for a globular protein: where M r = the molar mass of the protein in kilodaltons and R h = the hydrodynamic radius of the protein in nm.
Analytical ultracentrifugation. Velocity analytical ultracentrifugation was performed using a Beckman XL-A analytical ultracentrifuge with an An50-Ti rotor. Double-chamber Epon cells were used with 1.2 cm path lengths and quartz window assemblies. Protein concentration was measured at 37.5 μM with a 1:0.9 ratio of GcoA:GcoB for the GcoAB run in buffer containing 25 mM HEPES pH 7.5 and 50 mM NaCl. Samples were equilibrated at 20°C at 3000 rpm before accelerating to 40,000 rpm and taking 72 radial scans at 20 min intervals at a wavelength of 280 nm. Sednterp 54 was used to calculate buffer viscosity and density, v and of the protein sample and Sedfit 55 was used to analyze the scans, solve the Lamm equation, perform c(s) size-distribution analysis and determine f/f 0 .
MD simulation and DFT systems setup. Molecular dynamics simulations were performed with GcoA in the following conditions: complexed to guaiacol (GcoA: guaiacol), guaethol (GcoA:guaethol), syringol (Gcoa:syringol), vanillin (GcoA: vanillin), catechol (GcoA:catechol), and in absence of ligand (GcoA:apo). For GcoA:guaiacol, GcoA:guaethol, GcoA:syringol and GcoA:vanillin, we used the crystal structures reported in this work as starting point. GcoA:catechol and GcoA: apo systems were built from the GcoA:guaiacol structure by modifying and removing the guaiacol substrate, respectively. The heme group was considered in the pentacoordinate state in the GcoA:apo and GcoA:catechol systems, and in the hexacoordinate (compound I) state in the GcoA:guaiacol, GcoA:guaethol, GcoA: syringol and GcoA:vanillin systems. As the GcoA:vanillin crystal structure lacks residues 388 and 389, these were taken from the GcoA:syringol structure after structural alignment. Hydrogen atoms were added and the protonation states of titratable residues were estimated using H++ at pH 7.5 56,57 . At this condition, all Asp and Glu residues were considered deprotonated (bearing a -1 charge); His131, His221, His224, His255, and His343 were considered doubly protonated (+1 charge); and His80, His91, His213, His329, His349, His357, His367, and His405 were considered protonated only at the ε position (0 charge). The systems were then immersed in a rectangular water box extended at least 15 Å from the protein, and Na + cations were added to neutralize the system. The final systems comprised 74,000 atoms. Force field parameters for guaiacol, guaethol, syringol and vanillin were obtained from Generalized Amber Force Field (GAFF) 58 , with Restrained Electrostatic Potential (RESP) partial charges 59 derived at the HF/6-31 G(*) level with Gaussian 09 60 . Force field parameters for the heme group, both in the pentacoordinate state and compound I were taken from from Shahrokh et al. 61 . For the protein, the ff14SB Amber force field 62 was employed along with the TIP3P water model 63 . The simulations were performed using periodic boundary conditions, with short-range interactions truncated at a cutoff radius of 8.0 Å and particle mesh Ewald (PME) for long-range interactions 64 . The equations of motion were integrated with a time-step of 2.0 fs, with bonds involving hydrogen atoms constrained at their equilibrium values using SHAKE. The temperature was kept constant at 300 K using the Langevin thermostat with a collision frequency of 1.0 ps −1 . The pressure was controlled at 1.0 bar only during the initial equilibration steps (described below) with the Berendsen barostat using a relaxation time of 2.0 ps.
To prepare the systems for productive MD simulations, the following steps were carried out: (1) 2000 steps of energy minimization, with all the protein atoms restrained; (2) 2000 steps of energy minimization, with all the protein Cα atoms restrained; (3) 2500 steps of energy minimization without any restraints; (4) 100 ps of constant volume simulation, with no restraints, and with the temperature increasing at constant rate of 3 K/ps from 0 to 300 K; (5) 500 ps of constant pressure simulation with the temperature kept at 300 K; (6) 400 ps of constant volume and temperature simulation. After these steps, 1.0 μs of MD simulation was carried out with constant temperature and volume. For each system, such procedure was repeated three times. PMEMD, from the Amber16 package 65 , was used to perform all the equilibrium MD simulations. the most open structure obtained from the unbiased simulation of GcoA:apo, and RMSD(closed) is the RMSD measured from the crystal structure. The US simulations were performed using the colvar module of NAMD 2.12 66 (with the same Amber force field employed for the unbiased simulations). In the RMSD calculations, only the Cα atoms of residues 1-31 and residues 150-206 were considered. These residues correspond to the region involved in the open-close motions of GcoA. We split the conformational space into 28 equally spaced windows, centered at ξ values between -4.5 Å and 3.6 Å, therefore with increments of δξ = 0.3 Å. Simulations within each window were restrained with a harmonic potential of the form (½)k(ξ-ξ0)2, with k = 10 kcal mol −1 Å −2 . Simulation in an additional window centered at 2.2 Å and with k = 10 kcal mol −1 Å −2 was conducted to assure enough overlap between neighboring windows. Therefore, a total of 29 windows were used for the PMF calculation. Within each window, 260 ns of restrained MD simulation was carried out after a 10-ns equilibration (not considered in the PMF calculation), totalizing 270 ns/ window. Initial configurations for the different windows of the GcoA:apo US simulations were taken from the unbiased simulation where we observed the closed-to-open transition. For the GcoA:guaiacol and GcoA:catechol systems, we started from the closed structure and followed a scheme where we used the equilibrated configuration of the previous window (window i) to start the simulation of the next window (window i + 1). The PMFs were obtained as the average of PMFs calculated for blocks of 10 ns. The Weighted Histogram Analysis Method 67 was employed to reweight the biased histograms obtained with US MD.
Density functional theory calculations. DFT calculations were performed using Gaussian 09 60 . A truncated model containing the porphyrin pyrrole core, Fe center and a methanethiol to mimic cysteine as Fe-axial ligand was used. Geometry optimizations and frequency calculations were performed using unrestricted B3LYP (UB3LYP) 68,69 with the LANL2DZ basis set for iron and 6-31G(d) on all other atoms. Transition states had one negative force constant corresponding to the desired transformation. Enthalpies and entropies were calculated for 1 atm and 298.15 K. A correction to the harmonic oscillator approximation, as discussed by Truhlar and co-workers, was also applied to the entropy calculations by raising all frequencies below 100 cm -1 -100 cm -170 . Single point energy calculations were performed using the dispersion-corrected functional (U)B3LYP-D3(BJ) 71,72 with the LANL2DZ basis set on iron and 6-311+G(d,p) on all other atoms, within the CPCM polarizable conductor model (diethyl ether, ε = 4) 73,74 to have an estimation of the dielectric permittivity in the enzyme active site. The use of a dielectric constant ε = 4 has been shown to be a good model to account for electronic polarization and small backbone fluctuations in enzyme active sites 75 . All stationary points were verified as minima or first-order saddle points by a vibrational frequency analysis. Computed structures are illustrated with CYLView.
Data availability. Coordinates and associated structure factors have been deposited with the PDB (www.rcsb.org/) under accession codes 5NCB, 5OMR, 5OMS, 5OMU, 5OGX. Data supporting the findings of this study are available within the article (and its Supplementary Methods files) and from the corresponding authors upon reasonable request.