Structural insights into bifunctional thaumarchaeal crotonyl-CoA hydratase and 3-hydroxypropionyl-CoA dehydratase from Nitrosopumilus maritimus

The ammonia-oxidizing thaumarchaeal 3-hydroxypropionate/4-hydroxybutyrate (3HP/4HB) cycle is one of the most energy-efficient CO2 fixation cycles discovered thus far. The protein encoded by Nmar_1308 (from Nitrosopumilus maritimus SCM1) is a promiscuous enzyme that catalyzes two essential reactions within the thaumarchaeal 3HP/4HB cycle, functioning as both a crotonyl-CoA hydratase (CCAH) and 3-hydroxypropionyl-CoA dehydratase (3HPD). In performing both hydratase and dehydratase activities, Nmar_1308 reduces the total number of enzymes necessary for CO2 fixation in Thaumarchaeota, reducing the overall cost for biosynthesis. Here, we present the first high-resolution crystal structure of this bifunctional enzyme with key catalytic residues in the thaumarchaeal 3HP/4HB pathway.

www.nature.com/scientificreports/ All known ammonia-oxidizing Thaumarchaeota fix CO 2 with a modified version of the 3HP/4HB cycle originally described in Crenarchaeota and further characterized in N. maritimus 1,19,20 (Fig. 1). This cycle can be broken down into two characteristic halves: (1) In the first part of the cycle, one acetyl-CoA and two bicarbonate molecules are converted to succinyl-CoA through 3-hydroxypropionate intermediate; (2) this is followed by the conversion of succinyl-CoA to regenerate acetyl-CoA, producing two molecules of acetyl-CoA by way of a 4-hydroxybutyrate intermediate. From here, one of the acetyl-CoA molecules generated then serves as a carbon precursor for the next round of the cycle 1,21 . The individual steps of this cycle, however, distinguish it from other known aerobic and autotrophic CO 2 fixation pathways, as well as provide further evidence of the differentiation and efficiency of the 3HP/4HB cycle used by Thaumarchaea versus their Crenarchaeal ancestors 1 . Notably, the thaumarchaeal 3HP/4HB cycle requires two fewer high-energy phosphoric anhydride bonds of ATP per acetyl-CoA molecule than the crenarchaeal 3HP/4HB cycle and three fewer than the Calvin-Benson-Bassham cycle 1 .
By breaking down the 3HP/4HB cycle into individual steps, we can focus on the enzymes responsible for catalyzing each respective reaction. In doing so, the structure and kinetics of each enzyme can be determined to provide more insight into how the overall 3HP/4HB cycle in Thaumarchaeota is so energy efficient. Here we have focused on the Nmar_1308 enzyme because it plays an essential bifunctional role in the thaumarchaeal 3HP/4HB cycle, functioning as both as a 3-hydroxypropionyl-CoA dehydratase (3HPD) and crotonyl-CoA hydratase (CCAH) 1,22 . In its capacity as a 3HPD, Nmar_1308 catalyzes the conversion of 3-hydroxypropionyl-CoA to acryloyl-CoA and water. Conversely, as a CCAH Nmar_1308 converts crotonyl-CoA to 3-hydroxybutyryl-CoA through a dehydration reaction. The promiscuity of Nmar_1308 and a second bifunctional enzyme in the N. The dual enzymatic steps catalyzed by Nmar_1308 in the thaumarcheal 3HP/4HB cycle. Representation of 3HP/4HB cycle reactions in the thaumarcheal variant. The reaction showing the energy efficiency of the taumarcheal cycle is colored in red. Highlighted rectangle in red indicates the bifunctional reactions catalyzed by the Nmar_1308 protein. Nmar_1308 is a promiscuous enzyme with dual functions that dehydrates 3-hydroxypropionyl-CoA to acryloyl-CoA via 3-hydroxypropionyl-CoA dehydratase (3HPD) and hydrates crotonyl-CoA to 3-hydroxybutyryl-CoA via crotonyl-CoA hydratase (CCAH). 3HPD and CCAH are colored teal. The figure is created with BioRender.com.

Results and discussion
Apo-form Nmar_1308 structure. The Nmar_1308 gene encodes a bifunctional protein that plays critical roles in the 3HP/4HB cycle by functioning both as 3HPD and CCAH 1,22 (Fig. 1). The apo-form structure of Nmar_1308 was determined as hexameric form at 2.15 Å resolution in space group P3 2 (Fig. 2, Table 1). The homohexameric structure consists of a dimer of trimers arrangement. The asymmetric unit contains 6 Nmar_1308 monomers with a total of 1422 residues. Chain A has 232 modeled residues while chain B has 236; chain C has 237; chain D&E have 230 and chain F has 229 modeled residues. Each subunit contains an active site with catalytic residues Gly109, Glu112 and Glu132 and none of the chains have bound substrates/products or metal ions. These three residues are a part of the well documented canonical active site of bacterial and archaeal enoyl-CoA hydratase/dehydratases, involving two glutamate residues and two backbone amides forming the oxyanion hole ( Fig. 3A-C). www.nature.com/scientificreports/ Water molecule between catalytic residues at the active site. Two catalytic residues Glu112 and Glu132 play key roles during catalysis, one as an acid and the other as a base. In the Nmar_1308 structure, they form hydrogen bonds with a water molecule (W71) (Fig. 3D). Two other chains in the hexamer have water molecules at the same position, chain E (W188) and chain F (W239), while the other three are empty. The position of these water molecules (W71, W188, and W239) are very close to the water positions bridged by the corresponding two glutamic acid side chains of the other hydratase structures 24,25 . This addition of the water to the C3 catalyzes the hydration reaction from crotonyl-CoA to 3-hydroxybutyryl-CoA, and the dehydratase reaction performs in the reverse direction from 3-hydroxypropionyl-CoA to acryloyl-CoA by removal of water (Fig. 3A).
In the hydratase reaction, Glu132 protonates C2 of a ligand while Glu112 activates the water molecule that is to be added to C3 from the other side of the substrate 24,26 . In the reverse reaction, Glu112 acts as an acid attacking the hydroxyl oxygen of C3 for removal of water assisted by Glu132 as a base to extract a proton from C2. The hexamer of Nmar_1308 with three monomers having the water and the other three without may indicate the flexibility of the enzyme in accommodating either substrate for hydratase or dehydratase reactions.
Dynamic oxyanion hole. The apo-form Nmar_1308 crystal structure lacks electron density for residues 62-69 (some monomers lack one or two fewer residues) in 6 monomers, indicating the flexible nature of this loop located on the outer surface of each monomer. This loop is visible in ligand-bound structures of the related hydratases and dehydratases, indicating that the loop is flexible without a bound ligand. The residues near this loop display high B factors in both the apo-form Nmar_1308 and the inhibitor-bound structure of enoyl-CoA hydratase from Rattus norvegicus (PDB ID: 1DUB) (Fig. 3C). In contrast, the core of the enzyme hexamer is rather rigid including the regions in and around the binding pocket for the enoyl-group of substrates (Fig. 3C).
In the proposed hydratase/dehydratase reaction mechanisms, two conserved residues alanine and glycine (Ala64 and Gly109 in Nmar_1308) in the binding pocket act as an oxyanion hole by offering their backbone amides to the carbonyl oxygen (O 1 ) of the substrate and stabilizing the negative charge accumulated on O1 oxygen in Table 1. Data collection and refinement statistics of X-ray crystallography data collection and structure refinement. Numbers corresponding to the highest resolution shell are shown in parentheses.   www.nature.com/scientificreports/ the transition state. Interestingly, Ala64 is part of the abovementioned flexible loop disordered in the apo-form Nmar_1308 structure while Gly109 is at the N-term end of a relatively rigid α3 helix. Again, in the holo-form of the hydratase and dehydratase structures, the conserved alanine of the oxyanion hole is stable. Taken together, the oxyanion hole is dynamically assembled when a substrate is bound at the active site and disassembled as the product leaves.

Data collection
Comparison with other hydratase/dehydratase structures and substrate specificity. To further confirm these notions of the active site architecture and flexibility, we then expanded the structure comparison to representative homologous structures (Fig. 4). Superposition of the Nmar_1308 structure with 11 available structures indicates that there are minor conformational differences between Nmar_1308 protein and homologous hydratase/dehydratases structures. These structures were classified based on their catalytic properties (hydratase vs. dehydratase) or their complex states (apo or holo). Among these structures, hydratases in holo form (PDB IDs: 1MJ3, 1DUB, 2DUB, 1EY3, 3Q0G) and in apo form (PDB IDs: 3PZK, 3Q0J, 3H81, 5XZD); and a dehydratase in complex with its ligand (holo form) were (PDB IDs: 5JBX) superposed with the Nmar_1308 structure with the indicated root mean squared deviation (RMSD) values. Msed_2001 (PDB IDs: 5ZAI) is shown as bifunctional 22,26 . It is notable that the RMSD value between Msed_2001 and Nmar_1308 is quite small, 0.91 Å, the lowest among the hydratase/dehydratase compared in this study (the other RMSD are 1.04 Å to 1.22 Å).
To provide a better understanding of these conformational differences and highlight the importance of these specific residues (which may define Nmar_1308 as a bifunctional enzyme), a multiple sequence alignment was performed using Jalview 27 (Fig. 5). The structure-based sequence alignment analysis shows the residues critical for the hydratase/dehydratase reactions, Ala64, Gly109, Glu112 and Glu132 are well conserved (Fig. 5A,B), and consistent with the active site residues reported earlier 26,28,29 .
The structure-based multiple sequence alignment shows how Nmar_1308 restricts the range of short enoyl-CoA and 3-hydroxy-acyl-CoA substrates using Trp141 in the enoyl-CoA binding pocket. In other short enoyl-CoA hydratases this residue is Tyr142 in 3HPD of M. sedula (PDB ID: 5ZAI) 26 . These bulky residues serve as a limiter of the substrate length by closing the bottom of the substrate enoyl-CoA binding pocket. In contrast, smaller or thinner side chains such as alanine (Rattus norvegicus (2DUB)), methionine (Roseovarius nubinhibens (5XZD)), Myxococcus xanthus (5JBX)), and glycine (Mycobacterium tuberculosis H37Rv (3Q0J)) can allow for longer chain enoyl-CoA as substrates. In addition, Trp231 provides additional hydrophobic constriction from helix α11 of the neighboring monomer, hence the trimer is the minimal biological enzyme unit for hydratase and dehydratase functions. This is well conserved between long and short chain enoyl-CoA hydratases/dehydratases; tryptophan, phenylalanine, or tyrosine. These residues play important roles to define substrate specificity. www.nature.com/scientificreports/ Extended hydrogen bond network at the active site of dual function hydratase/dehydratase. The multiple sequence alignment also suggests that there are several less well conserved residues: Cys111, Ser116, and Gln130 (Fig. 5A,C). They might play a role for assisting the bifunctionality of this protein, as they are in the vicinity of the active site of the hydratase/dehydratase (Fig. 6, Table 2). For example, while Cys111 is conserved in homologous hydratase structures 29,30 , dehydratase structures show either threonine or leucine 26,31 . The sulfur atom of Cys111 is actually very close to side chain oxygen of Gln130 (distance of 3.5 A).
In the structures where Cys111 is replaced with leucine, the residue corresponding to Gln130 is replaced by another leucine, hence forming a hydrophobic interaction. These Cys111-Gln130 or Leu-Leu/Thr pairs are probably the results of co-evolution of these two residues to keep the stable active site architecture (Fig. 6, Table 2). Another residue which shows variation is Ser116 that is one turn upstream of the catalytically important Glu112 along α3 helix, partnering in the hydrogen bond network with Glu112. The serine is replaced by a methionine in homologous hydratase structures and alanine in the homologous dehydratase structure, both of which may not be able to participate in the hydrogen bond network of Glu112. Gln130 is near the other glutamic acid Glu132 with a hydrogen bond between their oxygen atoms, hence participating indirectly in the hydratase/dehydratase reaction. This glutamine is conserved in homologous hydratase structures while both dehydratase and bifunctional enzyme have a leucine (Leu130 for dehydratase from Myxococcus xanthus (5JBX), and Leu 131 for the bifunctional enzyme from Metallosphaera sedula (5ZAI)). These leucine residues will not be able to participate in the hydrogen bond network. In summary, this study has provided the first high-resolution crystal structure of an enzyme within the thaumarchaeal 3HP/4HB cycle. Thaumarchaeota not only represent one of the most abundant microbial groups on the entire planet, but this CO 2 fixation pathway is one of the most energy-efficient pathways known. Thus, the molecular insights into the bifunctional Nmar_1308 enzyme obtained in this study have implications for understanding the global C cycle (e.g., dark CO 2 fixation) as well as for bioengineering applications using synthetic biology approaches.  Protein expression and purification. Overnight cultures started with single colonies from agar plates in LB media, followed with 1:100 dilution into 2L cultures for large-scale protein purification. After cultures reached an optical density of 0.8 at 600 nm, IPTG (final concentration 0.7 mM) was added to induce protein expression. Cultures were incubated overnight at 16 °C, and cell pastes were collected after centrifugation at 4000 rpm. The cells were resuspended in a lysis buffer (pH 7.0, 1 M KCl), and sonicated. Following another centrifugation, soluble protein solution was collected and kept at 4 °C. Protein was further purified using Ni-NTA affinity resin (GE Healthcare). The column was first washed with a binding buffer containing pH 7.0, 300 mM NaCl, 20 mM TRIS, and bound protein was eluted in elution buffer containing 500 mM imidazole, 300 mM NaCl, 50 mM TRIS, pH 7.5. Thrombin was then added to cleave N-terminal histidine tag followed by a reverse Ni-NTA chromatography to remove the cleaved hexa-histidine tag. The cleaved fractions containing the enzyme without any affinity tag were pooled, concentrated and applied to the S200 (Superdex 200) column as a final purification step. Concentration of the final protein solution before crystallization was measured using UV absorption at 280 nm (Nanodrop UV spectra) and the purity is determined by SDS-PAGE. Protein solution concentrated to 10 mg/ml-confirmed by Nanodrop UV spectra (280 nm).
Crystallization. The purified protein solution was added to crystal screen conditions in Terasaki plates (BioExpress), in 1:1 (v/v) ratio and sitting drop geometry. For the quaternary complex, protein solution was first mixed with equimolar amounts of 3-hydroxypropionate, coenzyme-A, and ATP. Initial screens produced no crystals, in apo-enzyme; beta-mercaptoethanol (BME) was added in both contexts, to reduce possible disulfide bonds between oligomers. Following the BME addition, crystals were obtained in both contexts. The apo-enzyme crystallized exclusively in 0.2 M MgCl 2 , 0.1 M sodium citrate tribasic dihydrate (pH 5), 10% w/v PEG. To prepare for diffraction experiments at 100 K, apo-enzyme was soaked overnight in 2.5 µl of 33% glycerol cryoprotectant (v/v) and 77% (v/v) mother liquor. Following cryoprotection, crystals were flash-frozen in liquid nitrogen.
Data collection and processing. Synchrotron X-ray diffraction images were collected from a single crystal of Nmar_1308 using the Pilatus 6 M detector at beamline BL12-2 of the Stanford Synchrotron Radiation Lightsource (SSRL), SLAC National Accelerator Laboratory in Menlo Park, CA, USA. The diffraction data were collected to 2.1 Å resolution with unit cell dimensions a = 75.87 Å b = 75.87 Å c = 207.96 Å α = 90 β = 90 γ = 120, the space group P3 2 at a wavelength of 1.0000 Å and temperature 100 K. The diffraction data were processed with the XDS package for indexing and scaled by using XSCALE 32 .
Structure determination and refinement. Apo-Nmar_1308 structure was determined by using the automated molecular replacement program PHASER 33 implemented in PHENIX software 34 with the previously published structure as a search model and initial rigid body refinement (PDB ID: 2DUB). After simulatedannealing refinement, individual coordinates and TLS parameters were refined. Potential positions of altered side chains and water molecules were checked by using the program COOT 35 . Then, positions with strong difference density were retained. The Ramachandran statistics for cryo-synchrotron structure (most favored/ additionally allowed/disallowed) were 94/5/0% respectively. Monomer-monomer superposition was performed with the similar structures of Nmar_1308 and root-mean-squared deviations (RMSDs) were calculated based on C-alpha atom positions. Average RMSD values were calculated by taking the average of each pairwise alignment RMSD between one chain of Nmar_1308 and another from the superposed homologous structure (Fig. 4). For the overall RMSD, the average RMSDs for each chain is averaged again. Standard errors (SEs) for the average RMSDs were calculated from the standard deviations of the pairwise RMSD values between two proteins using SE = (standard deviation of pairwise RMSDs)/sqrt (N), N is the number of monomer pairs used, 6 × 6 = 36 for hexamers and 6 × 3 = 18 for trimers. SE values range from 0.0003 Å to 0.008 Å. This average RMSD value between different monomers within the Nmar_1308 hexamer range from 0.16 to 0.24 A, substantially smaller than those between Nmar_1308 and the superposed homologous structures. Sequence alignment was performed by using homologous sequences from different organisms via Jalview software 27 . During the alignment, ClustalW 36 was used as an algorithm with default parameters of Jalview. All figures were generated by PyMOL 37 (www. schro dinger. com/ pymol).