The crystal structure of D-xylonate dehydratase reveals functional features of enzymes from the Ilv/ED dehydratase family

The Ilv/ED dehydratase protein family includes dihydroxy acid-, gluconate-, 6-phosphogluconate- and pentonate dehydratases. The members of this family are involved in various biosynthetic and carbohydrate metabolic pathways. Here, we describe the first crystal structure of D-xylonate dehydratase from Caulobacter crescentus (CcXyDHT) at 2.7 Å resolution and compare it with other available enzyme structures from the IlvD/EDD protein family. The quaternary structure of CcXyDHT is a tetramer, and each monomer is composed of two domains in which the N-terminal domain forms a binding site for a [2Fe-2S] cluster and a Mg2+ ion. The active site is located at the monomer-monomer interface and contains residues from both the N-terminal recognition helix and the C-terminus of the dimeric counterpart. The active site also contains a conserved Ser490, which probably acts as a base in catalysis. Importantly, the cysteines that participate in the binding and formation of the [2Fe-2S] cluster are not all conserved within the Ilv/ED dehydratase family, which suggests that some members of the IlvD/EDD family may bind different types of [Fe-S] clusters.

catalytically active [4Fe-4S] cluster that is unstable in aerobic environments has been reported in 6PGDHT from Zymomonas mobilis 26 and in DHADHT from Escherichia coli 23 . In contrast, a functional and stable [2Fe-2S] cluster in DHADHTs from several organisms, including bacteria, archaea, fungi and plants, has been observed 24,25 .
A detailed structural and functional characterization of these enzymes is important for understanding the catalytic mechanism and the substrate specificity, which could help improve the synthetic metabolic pathways in biorefinery industries. However, to date, only limited three-dimensional structural information is available for the IlvD/EDD enzymes. The coordinates of Shewanella oneidensis 6PGDHT in the apo form have been submitted to the Protein Data Bank (PDB code: 2GP4) but are not yet published. Moreover, this 6PGDHT structure is incompleted and partially incorrect since the original 2GP4 model contains a mixed β-sheet at N-terminal domain, but a corrected model is composed of parallel β-sheet. The model also lacks the cofactor and several surrounding loops in the active site. Recently, we solved the first representative of the IlvD/EDD family in its holo form, an L-arabinonate dehydratase from Rhizobium leguminosarum bv trifolii (RlArDHT, PDB code: 5J84). This structure revealed the presence of a [2Fe-2S] cluster and a Mg 2+ ion in the active site 27 .
In this paper, we describe the first crystal structure of a D-xylonate dehydratase at 2.7 Å resolution and compare it with the two other known crystal structures (PDB codes: 2GP4 and 5J84) belonging to the IlvD/EDD family. D-xylonate dehydratase from Caulobacter crescentus (CcXyDHT) shares 42% sequence identity with RlArDHT, and the crystal structure of CcXyDHT shows that this enzyme has a [2Fe-2S] and a Mg 2+ ion in its active site, similar to RlArDHT. The three different enzyme structures available from the IlvD/EDD enzyme family now allow the analysis of the structure and functional features within this protein family.

Results and Discussion
The re-refined structure of So6PGDHT. The crystal structure of 6-phosphogluconate dehydratase from Shewanella oneidensis (So6PGDHT) at 2.5 Å resolution (R work = 0.23 and R free = 0.29) is available in the Protein Data Bank (PDB code: 2GP4), but the structure lacks a [Fe-S] cluster and a divalent metal ion. When inspecting the electron density map of So6PGDHT, we found several obscured regions in the crystal structure. Therefore, the 2GP4 model was re-refined, which resulted in improved R-values (R work = 0.17 and R free = 0.23). However, the improved model of 6PGDHT still had missing regions, Val35-Leu63, Lys182-Ile185 and Thr215-Thr226, and it was lacking density for the [Fe-S] cluster or Mg 2+ ion in the active site. The re-refinement resulted in new locations for the active site residues Cys112 and Asp113. In the re-refined model, the putative iron-sulphur cluster binding residues Cys112 and Cys154 are sufficiently close to ligate a [Fe-S] cluster. The location of the third putative cluster-binding residue Cys220 is undefined because, it is located in the missing loop, the structure of which is unknown due to the absence of electron density.
Overall structure of CcXyDHT. The crystal structure of holo-CcXyDHT was determined at 2.7 Å resolution by molecular replacement, using L-arabinonate dehydratase from Rhizobium leguminosarum bv trifolii (5J84) as a template. The final structure had good R-factor and R-free values of 17.8% and 22.3%, respectively. The structure contained a tetramer in the asymmetric unit in the space group C2. Unambiguous electron density was seen for all of the amino acid residues of CcXyDHT, except for the disordered region at the N-terminus, which includes Asp1-Ser2 and the N-terminal Strep-tag II. The refinement resulted in a final model with good refinement statistics ( Table 1).
The enzyme is an α/β protein that consists of two domains: a N-terminal domain (residues Asn3-Leu358) and a C-terminal domain (residues Leu383-His591) (Fig. 2a,b). The two domains are connected by a long loop from Gln359 to Phe382. The N-terminal domain is composed of a β-sheet with four parallel β-strands surrounded by four α-helices. In addition, the N-terminal domain contains an extension at the N-terminus (residues Arg10 -Ser43) with three α-helices and an insertion with β-hairpin (residues Gly157 -Val164) and a helix-loop-helix (residues Thr168 -Ser194). The C-terminal domain is composed of a β-sheet consisting of six parallel and two anti-parallel β-strands that are arranged like a β-barrel. Secondary structure analysis showed that the complete polypeptide chain consists of 27 α-helices and 16 β-strands. Each monomer contains one [2Fe-2S] cluster and a Mg 2+ ion. The substrate binding site is located in the cavity between the domains, but the [Fe-S] cluster and the Mg 2+ binding site are located entirely in the N-terminal domain (Fig. 2c).
Quaternary structure. The quaternary structure of CcXyDHT is a homotetramer (Fig. 2d), similar to the homotetrameric structure of RlArDHT. D-gluconate dehydratase from Achromobacter xylosoxidans has also been reported to form a tetramer in solution 22 . A packing analysis of the crystal structure using the PISA server 28 suggested that the quaternary structure contains a stable dimer with an extensive monomer-monomer interface (4640 Å 2 ). This interface is predominantly formed by residues from the larger-sized N-terminal domain. An α-helix (residues 99-115) from this domain forms the central part of the interface. The tight dimer then packs against another similar dimer in such a way that one monomer has an interface with both polypeptide chains from the second dimer (887 and 607 Å 2 ). The total monomer-monomer interface area in the tetramer is 12268 Å 2 .
In contrast, SoPGDHT forms a dimer in the crystal structure that is similar to the tight dimer of CcXyDHT (Fig. 2e,f). However, unlike CcXyDHT, SoPGDHT does not form a tetrameric structure. Many IlvD/EDD enzymes such as L-arabonate dehydratase from Azospirillum brasiliense 20 , dihydroxy acid dehydratase from Spinacia oleracea 24 , and 6-phosphogluconate dehydratase from Zymomonas mobilis have been reported to form dimers in solution 29 .
The formation of the extensive dimer interface in CcXyDHT (and in RlArDHT) is essential for the formation of the active site. Although the residues that participate in the binding of Mg 2+ and the [2Fe-2S] cofactor are located in the N-terminal domain of the first monomer, the polypeptide chain from the second monomer covers the active site by extending its N-terminal helix (residues 21-33) into the first monomer. Because the amino acid residues of this helix, which point towards the active site, are different in CcXyDHT and RlArDHT, this helix may be called a substrate recognition helix. In addition, the C-terminus of the dimeric counterpart protrudes into the active site of the first monomer by placing His591, which forms a salt bridge with Glu463, in the active site (Fig. 2e). The crystal structure of So6PGDHT shows similar dimer formation to that observed in CcXyDHT but does not show any evidence that the polypeptide chain from the dimeric counterpart participates in the formation of the active site. In contrast, the So6PGDHT structure represents the apo form, and some of the loop regions are not visible in the electron density map due to disorder in the absence of cofactor (Fig. 2f).

Structural comparison of three IlvD/EDD enzymes. The sequence of CcXyDHT is 42% identical to
RlArDHT and 27% identical to So6PGDHT. A sequence alignment of these three enzymes and of representative members of the IlvD/EDD family is shown in Fig. 3. The higher sequence identity between CcXyDHT and RlArDHT, compared to So6PGDHT, is also reflected in a low, 0.70 Å RMS deviation between the Cα atoms of monomer proteins. The overall structures are very similar except for a small insertion of a β-hairpin at Leu409-Val417 in CcXyDHT. The [2Fe-2S] cluster and Mg 2+ ion coordinating residues are well conserved between these enzymes. The improved model of So6PGDHT can be superimposed with CcXyDHT with a RMSD of 1.95 Å. The N-terminal core domain consisting of a four-stranded parallel β-sheet surrounded by α-helices and the C-terminal core containing a mixed eight-stranded β-sheet are similar. However, the large RMSD value between So6PGDHT and CcXyDHT is due to the following major differences between the structures: 1) the N-terminal regions, 2) the helix-loop-helix motifs and 3) the β-hairpin structures (Fig. 4).
The N-termini of CcXyDHT and So6PGDHT are very different. In CcXyDHT, this region (residues 1-45) forms a bundle of three short helices, but in So6PGDHT, this region is longer (residues 1-65) and contains two long α-helices, which have a very different orientation and pack against the core protein in an extended fashion. A part of the N-terminal region (35-63) is undefined in the structure of So6PGDHT. Perhaps the most remarkable structural difference between the proteins is the location of the Pro186-Ser214 region that contains a helix-loop-helix motif structure at the N-terminal domain of So6PGDHT. The helix-loop-helix in SoPGDHT has been flipped and moves by approximately 30 Å from the position observed in CcXyDHT (and in RlArDHT), where it partially covers the active site. Interestingly, in an open form of an RlArDHT mutant co-crystallized with L-arabinonate, this helix-loop-helix structure is slightly displaced by 7 Å, which suggests that it has a functional role in the catalytic cycle 27 .
In So6PGDHT, the helix-loop-helix region is twisted away from the active site. Unfortunately, the preceding loop (residues 182-185) and the following loop (residues 215-226) regions are disordered in the So6PGDHT structure, so how accessible the active site of So6PGDHT is compared to those of CcXyDHT and RlArDHT remains an open question. The third difference between So6PGDHT and CcXyDHT is an additional β-hairpin structure (residues 383-396) in the connecting loop of So6PGDHT. This packs against the abovementioned helix-loop-helix structure (Fig. 4a-c). Sequence motifs to identify the IlvD/EDD family. We have analysed the multiple amino acid sequence alignment of some representative members of the IlvD/EDD family. The sequence alignment is presented in Fig. 3, including the sequences of the three enzymes (CcXyDHT, RlArDHT, So6PGDHT) for which three-dimensional structure information is available. The description of the IlvD/EDD protein family in the PROSITE database 30 reports that this family has two common sequence motifs. The first sequence motif consists of eleven residues with a peptide pattern Cys-Asp-Lys- These two motifs can be found in the CcXyDHT structure (Figs 3, 4d). The first motif is Cys-Asp-Lys-Thr-Thr-Pro-Ala-Gly-Ile-Met-Ala at the N-terminal domain (residues 128-138). These eleven residues are located in the helix α6 and the loop connecting the strand β3 and helix α6. The conserved Cys128 participates in the binding of the [2Fe-2S] cluster, Asp129 and Lys130 (in carbamylated form) in the binding of Mg 2+ .

Data collection CcXyDHT
The second motif is Pro-Thr-Leu-Gly-Asp-Gly-Arg-Gln-Ser-Gly-Thr-Ala (residues 482-493) in CcXyDHT. These twelve residues are located at the C-terminal domain consisting of strand β13 and the loop follows. Interestingly, the first two and last two residues of this motif in CcXyDHT do not match the predicted sequence motif. The proposed conserved threonine is not conserved in CcXyDHT and is replaced by Gly485. The conserved Asp486 forms hydrogen bonds with the main chain nitrogens of Gly452 and Ala453 in the loop structure. The conserved Arg488 is located rather close (4.3 Å) to the Mg 2+ but probably does not directly participate in the binding of magnesium; consequently, its role may be electrostatic. Ser490, which is proposed to act as a base in enzyme catalysis, as supported by experimental results from the corresponding Ser480Ala mutant of RlArDHT, is fully conserved among the IlvD/EDD protein family 27 .
The active site of CcXyDHT. The subsequent analysis of the active site is based solely on the crystal structures of the holo forms of CcXyDHT and RlArDHT and not on the structure of So6PGDHT because the latter structure represents an incomplete structure in the apo form. The electron density map of the CcXyDHT crystal structure shows that the binding of the Mg 2+ and the [2Fe-2S] cluster to the active site is similar to that observed in RlArDHT 27 .
The hexacoordination of the Mg 2+ ion resembles a square bipyramid in which Glu92, Asp129, Glu463 and a water (or hydroxide ion) are approximately in the plane and another water molecule (or a hydroxide ion) and carbamylated Lys130 are in the axial positions. One water molecule in the square plane is hydrogen-bonded (2.7 Å) to Thr206. Due to the obtained resolution, we cannot fully exclude the possibility that Thr206 may also directly interact with the Mg 2+ ion. However, the observed distance from Thr206 to the Mg is approximately 3.8 Å. Identical residues can be found in the structure of RlArDHT and all four Mg 2+ binding residues (and Thr206) are fully conserved within the protein family (Fig. 3), suggesting that the binding of magnesium is a universal feature of the IlvD/EDD enzyme family. The crystal structure of CcXyDHT unambiguously shows that  Lys130 is carbamylated (Fig. 5c,d) as is also found in the crystal structure of RlArDHT 27 . The residues Asp129 and Lys130 (or more correctly KCX130) belong to the N-terminal sequence motif. The complete structures of CcXyDHT and RlArDHT support an interpretation that carbamylation of this lysine is a common feature among IlvD/EDD enzymes.
The crystal structure of CcXyDHT suggests that the enzyme has a [2Fe-2S] cluster, similar to that observed in RlArDHT. In the cluster, the Fe1 atom is coordinated by two sulfide ions and two cysteines (Cys60 and Cys128). Cys128 belongs to the Cys-Asp-Lys motif, which is fully conserved among IlvD/EDD enzymes. Interestingly, Cys60 is conserved only among the subfamily of IlvD/EDD enzymes including pentonate dehydratases and some dihydroxy acid dehydratases 19 . In 6PGDHT, the putative corresponding cysteine belongs to a Cys-Asp-Gly motif  Fig. 1), which is not observed in pentonate dehydratases. This may reflect the organization of a different cluster type, for example, the binding of [2Fe-2S] versus [4Fe-4S] in the active site. In CcXyDHT, Fe2 atom of the cluster is coordinated by two sulfide ions and one cysteine residue (Cys200). This cysteine residue is fully conserved within the IlvD/EDD protein family.
Residues Ser490 and Thr492 from the C-terminal sequence motif participate in the formation of the substrate binding site (Fig. 5c,d). Based on the crystal structure and mutagenesis studies of RlArDHT, we have previously suggested that serine acts as a Lewis base in catalysis 27 . In CcXyDHT, Ser490 is approximately 6 Å away from the [2Fe-2S] cluster and 4 Å away from the Mg 2+ ion. As in RlArDHT, OG1 of Thr492 forms a short (and strong) hydrogen bond (2.6 Å) with OG of Ser490 in CcXyDHT. In addition, a hydrogen bond (3.2 Å) exists between the main chain nitrogen of Thr492 and OG1 of Ser490. Furthermore, NE1 of Trp171 is hydrogen bonded (2.9 Å) to OG1 of Thr492. This hydrogen bond network may increase the nucleophilicity of Ser490. Thr492 in CcXyDHT and the corresponding Thr482 in RlArDHT are not conserved within the IlvD/EDD protein family (replaced by Ala/Gly) as seen in Fig. 3. However, the next amino acid after Ala/Gly in the sequence within the members of this protein family is Ser/Thr, which could play a similar role to Thr492 in the CcXyDHT structure. The proposed elimination reaction is shown in Fig. 6, where [2Fe-2S] cluster act as a Lewis acid and the alkoxide ion form of Ser490 is a base. In the end, the enol product is tautomerized to the final C2 oxo product.

Molecular determinants for the substrate specificity.
To understand the substrate specificity between CcXyDHT and RlArDHT, the active sites were superimposed. The preferred substrates for CcXyDHT and RlArDHT are stereoisomers of pentonates, having different configurations at the C4 position. The K m values for D-xylonate and L-arabinonate are in the mM range (2-10 mM) for both enzymes, but the k cat values differ more significantly. The catalytic efficiency k cat /K m is 120 times higher in CcXyDHT for D-xylonate compared to L-arabinonate, and 10 times higher in RlArDHT for L-arabinonate compared to D-xylonate. In addition to pentonate sugar acids, various hexonate sugar acids can be dehydrated, as long as the configurations at C2 and C3 are maintained 19 .
The active site superimposition of CcXyDHT and RlArDHT shows that the binding of the Mg 2+ and the [2Fe-2S] cluster is very similar, even identical in both enzymes. However, the differences in the active site cleft are significant in the N-terminal helix (residues 21-33) from the dimeric counterpart. The N-terminal helix can be therefore identified as a substrate recognition helix, where Trp31 in RlArDHT is replaced by Arg30 in CcXyDHT. This Arg residue forms a salt-bridge with Glu29 in CcXyDHT (the corresponding residue in RlArDHT is Gly30). Tyr27 in RlArDHT is replaced by Leu26 in CcXyDHT, and consequently, the conformation of Asn92 is altered. In addition, both enzymes have a C-terminal histidine residue from the dimeric counterpart (His591 in CcXyDHT and His579 in RlArDHT) in the active site (Fig. 5d). The carboxylate group of the C-terminus is salt-bridged to Arg22 and Lys422 in RlArDHT and to Arg174 in CcXyDHT. The differences between the catalytic efficiency thus probably originate from the recognition helix and C-terminus, which highlights the important role of the dimerization.

Materials and Methods
Protein purification and crystallization. D-xylonate dehydratase from Caulobacter crescentus was heterologously expressed in E. coli, purified, and crystallized, and data were collected at 2.66 Å resolution using synchrotron radiation as described previously 19,31 . The data were processed and scaled using XDS and XSCALE 32 .
Structure determination and refinement. A crystal structure of CcXyDHT at 2.7 Å was determined by molecular replacement using PHASER 33 with the coordinates of L-arabinonate dehydratase (PDB code 5J84) as a template. The model building was performed in COOT 34 . The refinement of the model was performed using PHENIX refine 35 . The validation of the refinement was done using MolProbity 36 . The re-refinement of So6PGDHT (PDB code 2GP4) was also done using the PHENIX software 35 .
UV-Vis spectroscopy. The presence of a [Fe-S] cluster in CcXyDHT was analysed using UV-Vis spectroscopy. A purified protein sample was diluted with a sample storage buffer consisting of 50 mM Tris-HCl and 5 mM MgCl 2 at pH 7.5. The sample was loaded into a quartz cuvette and data were collected using a UV-Vis/NIR/900 spectrophotometer (PerkinElmer, USA) at wavelengths of 260-800 nm against sample storage buffer. The spectrum is shown in Figure S1.
Enzyme activity test. The CcXyDHT activity was assayed as described previously 19 . The purified enzyme (6 µg in a 600 µl reaction) was incubated at 30 °C in 50 mM Tris-Cl, 10 mM MgCl 2 , at pH 8.5 using 20 mM D-xylonate as substrate. At 2-min intervals, a 100-μl sample was transferred to a microcentrifuge tube and the reaction was stopped by adding 10 μl of 12% trichloroacetic acid (TCA). The product formation was measured using the thiobarbituric acid (TBA) assay 37 . Aliquots of 50 μl from all samples were transferred into fresh microcentrifuge tubes, after which 125 μl of 25 mM periodic acid (dissolved in 0.2 M H 2 SO 4 ) was added and incubated at room temperature for 20 min. To the reaction, 250 μl of 2% sodium arsenate in 0.5 M HCl was added and mixed, followed by 1 ml of 0.3% TBA. The samples were then incubated at 100 °C for 10 min. Prior to reading the absorbance at 549 nm, the samples were mixed with an equal volume of DMSO.
Sequence alignment and structural comparisons. The amino acid sequences of the enzymes were collected from the NCBI GeneBank 38 . The sequence alignment was carried out using the online multiple sequence alignment tools Clustal Omega 39 and MacVector 40 . The secondary structure analysis was carried out using the online tool STRIDE 41 . The alignment figure was created using ALINE 42