Structure and substrate selectivity of the 750-kDa α6β6 holoenzyme of geranyl-CoA carboxylase

Geranyl-CoA carboxylase (GCC) is essential for the growth of Pseudomonas organisms with geranic acid as the sole carbon source. GCC has the same domain organization and shares strong sequence conservation with the related biotin-dependent carboxylases 3-methylcrotonyl-CoA carboxylase (MCC) and propionyl-CoA carboxylase (PCC). Here we report the crystal structure of the 750-kDa α6β6 holoenzyme of GCC, which is similar to MCC but strikingly different from PCC. The structures provide evidence in support of two distinct lineages of biotin-dependent acyl-CoA carboxylases, one carboxylating the α carbon of a saturated organic acid and the other carboxylating the γ carbon of an α-β unsaturated acid. Structural differences in the active site region of GCC and MCC explain their distinct substrate preferences. Especially, a glycine residue in GCC is replaced by phenylalanine in MCC, which blocks access by the larger geranyl-CoA substrate. Mutation of this residue in the two enzymes can change their substrate preferences.

T erpenes are a common biosynthetic precursor of many biological compounds, including hormones and cholesterol, and have long been studied for their many medicinal properties [1][2][3] . Monoterpenes are the simplest terpenes and are formed by the joining of two five-carbon isoprene units. Terpenes are modified and cyclized in various ways to form numerous terpenoids, including hormone precursors, scents, and flavourings. Isoprenylation is also used for localizing proteins to the cell membrane. In certain bacteria, terpenes can be used as the primary carbon source and metabolized to enter the tricarboxylic acid cycle as acetyl-CoA.
Geranyl-CoA carboxylase (GCC) activity was first purified from a Pseudomonas organism grown with either the related acyclic monoterpenoid citronellol 4 or geranic acid [5][6][7] as the sole carbon source. GCC also possesses 3-methylcrotonyl-CoA carboxylase (MCC) activity 5,6,8 , and 3-methylcrotonyl-CoA is an intermediate in the complete catabolism of citronellol and geranic acid (Fig. 1a). In comparison, the authentic MCC enzyme cannot carboxylate the larger geranyl-CoA substrate. GCC and other enzymes involved in the first half of the geranic acid degradation pathway are encoded by the acyclic terpene utilization operon, while MCC and those involved in the second half are encoded by the leucine/isovalerate utilization operon 8,9 . GCC is found in Pseudomonas and a collection of other bacteria. MCC is conserved from bacteria to humans, and is crucial for leucine metabolism (Fig. 1a). Deficiencies in MCC activity in humans are linked to methylcrotonylglycinuria and other serious metabolic diseases 10-12 . GCC and MCC are members of the biotin-dependent carboxylase superfamily [13][14][15][16][17] . Both consist of two subunits, a and b, and their holoenzymes are 750-kDa-a 6 b 6 dodecamers. The a subunit contains the biotin carboxylase (BC), BC-CT interaction (BT) 18,19 , and the biotin carboxyl carrier protein (BCCP) domains (Fig. 1b, Supplementary Fig. 1). The b subunit contains the N and C domains of carboxyltransferase (CT, Supplementary Fig. 2). The a subunits of Pseudomonas GCC and MCC share 51% amino acid sequence identity, and their b subunits share 46% identity. These two enzymes therefore have high sequence conservation, but the molecular basis for their distinct substrate preferences is currently not known. In fact, many of the GCC/MCC enzymes in the sequence database have been misannotated (see below).
Propionyl-CoA carboxylase (PCC) shares the same domain organization as GCC and MCC (Fig. 1b). However, our recent crystal structures of MCC and PCC showed that their holoenzymes have strikingly different overall architectures 18,19 . Moreover, the connectivity between the N and C domains is different in the two b subunits (Supplementary Fig. 3). These results led to the hypothesis that there are two distinct lineages of biotin-dependent acyl-CoA carboxylases 19 : those that carboxylate the a carbon of the organic acid (PCC, acetyl-CoA carboxylase) and those that carboxylate the g carbon of an a-b unsaturated acid (MCC, GCC) (Fig. 1a). A prediction of this hypothesis is that GCC should have a similar b subunit connectivity as MCC, and that the GCC holoenzyme may have a similar architecture as MCC. Here we report the crystal structure of the GCC holoenzyme, as well as mutagenesis studies to define the molecular basis for the distinct substrate preferences of GCC and MCC.

Results
Structure of GCC b subunit hexamer. We first determined the structure of Pseudomonas aeruginosa GCC (PaGCC) b subunit at 2.4 Å resolution. The current atomic model has good agreement with the crystallographic data and the expected geometric parameters ( Table 1). The structure confirms our hypothesis, showing that GCCb has the same connectivity between its N and C domains as that of MCCb (Fig. 2a). The overall structures of the two subunits are similar, with root mean squared distance of 1.2 Å for 464 equivalent Ca atoms between them, consistent with their strong sequence conservation. The overall structures of the two b 6 hexamers are similar to each other as well, with root mean squared distance of 1.3 Å for 2,866 equivalent Ca atoms (Fig. 2b). (1) Dehydrogenase (2) Biotin-dependent carboxylase (3) Hydratase (4) Lyase Acyclic terpenes AtuD (1) cis-Geranyl-CoA However, there are regions of substantial conformational differences between the two subunits. While the connectivity in GCCb is the same as in MCCb, the linker between the N and C domains has a different conformation (Fig. 2a). In addition, the C-terminal segment of GCCb (residues 526-538) contacts the N domain, while that of MCCb contacts the C domain. The long helix at the N-terminal end of GCCb is disordered in one of the three b subunits in the asymmetric unit ( Fig. 2b), although this helix is fully ordered in the GCC holoenzyme structure. Finally, structural differences in the active site region of the two enzymes define their distinct substrate preferences (see below).
Structure of 750-kDa GCC holoenzyme. We next determined the crystal structure at 3.1 Å resolution of the Pseudomonas fluorescens (also known as P. protegens) GCC (PfGCC) holoenzyme (Table 1, Supplementary Fig. 4). The overall architecture of the GCC holoenzyme is similar to that of MCC (Fig. 3a), consistent with our hypothesis. The GCC a subunits are arranged as trimers above and below the GCC b 6 hexamer core (Fig. 3b), forming an elongated, cylinder-shaped holoenzyme structure (Fig. 3a). The height of the holoenzyme is B200 Å and the diameter of the cylinder is B100 Å. When the b 6 cores of GCC free enzyme and MCC holoenzyme in complex with CoA are superposed, the BC domain in each GCC a subunit shows B9°difference in orientation compared with its equivalent in MCC (Fig. 3b). A similar difference is seen for the orientation and position of the BT domains in the two enzymes (Fig. 3a). In comparison, a difference of B6°was observed for the a subunit between the MCC free enzyme and the CoA complex 19 , and the structure of GCC free enzyme is somewhat more similar to that of the MCC free enzyme, with a difference of B4°. Therefore, there might be some inherent flexibility in the orientation of the a subunits relative to the b 6 core in these holoenzymes.
The distance between the BC and CT active sites is B80 Å in the GCC holoenzyme (Fig. 3a). Therefore, the entire BCCP domain must translocate during GCC catalysis, as is the case with MCC (ref. 19), PCC (ref. 18), long-chain acyl-CoA carboxylase 20 , pyruvate carboxylase [21][22][23][24] , and urea carboxylase 25 . The BCCP-biotin is bound in the active site of the CT domain in the current structure, at roughly the same position as that in the MCC holoenzyme (Fig. 3a). In four of the six BCCP domains in the GCC holoenzyme, a portion of the structure located farthest from the holoenzyme core is disordered. In comparison, only one of the six BCCP domains is ordered in the MCC holoenzyme ( Fig. 3a) 19 .
We also carried out cryo electron microscopy (cryoEM) studies on the PfGCC holoenzyme and produced a reconstruction at 5.6 Å resolution (Fig. 4a,b). The crystal structure and the cryoEM reconstruction are in excellent agreement with each other overall. The cryoEM studies therefore provide important, independent confirmation that GCC assumes the same structure in solution. The four B subdomains of BC that are disordered in the crystal are observed by cryoEM. However, the two B subdomains observed in the crystal show a different conformation in the cryoEM reconstruction. The disordered segments of the BCCP domains in the crystal are also observed by cryoEM (Fig. 4a), but the overall positions of the BCCP domains in the CT active site are essentially the same in the crystal structure and cryoEM reconstruction.
A BT domain in the structure of GCC. The sequences of the BT domains are not well conserved among GCC, MCC and PCC, in contrast to the BC and CT domains ( Supplementary Figs 1,2). The structure of the BT domain in GCC consists of a long a-helix (aV) surrounded by an eight-stranded anti-parallel b-barrel (b22-b29, Fig. 5a). In comparison, the helix of the MCC BT domain is shorter (Fig. 5b), and its b-barrel is missing a strand (b24, Fig. 5a), although this might be unique to P. aeruginosa and closely related MCCs (Supplementary Fig. 1) 19 . In this regard, the structure of the GCC BT domain appears more similar to that of the PCC BT domain (Fig. 5c). On the other hand, the long loop connecting the helix to the first b strand, the 'hook' 18 , has different conformations in the three BT domains, and mediates unique interactions that may help to determine the relative positions of the a and b subunits in these holoenzymes. These unique interactions also ensure the fidelity of each holoenzyme, such that a GCC-MCC hybrid cannot be produced.
The C-terminal segment of this hook in the GCC BT domain contains a short b-strand and establishes similar interactions with the b subunit as observed in MCC (Fig. 5a). The first residue of this b-strand is Trp483, mostly conserved among GCC and MCC enzymes (Trp543 in PaMCC, residue numbers according to human MCC, Supplementary Fig. 1). The side chain is buried at the interface between the N and C domains of the b subunit. This part of the hook is also located near the long helix (a0) at the N-terminal end of the b unit, a structural feature that is absent in PCC.
The N-terminal segment of the hook in the MCC BT domain is much longer, and establishes interactions with the linker between the N and C domains of the b subunit (Fig. 5b). In fact, the conformational differences for this linker between GCC and MCC (Fig. 2a) are likely due to this interaction with the BT domain in MCC, as the conformation of the linker in the GCC holoenzyme would clash with this part of the hook in the MCC holoenzyme (Fig. 5b). This different conformation of the linker, by up to 10 Å, is in turn coupled to the different positioning of the C-terminal segments of the b subunits. The C-terminal segments of two monomers in a b 2 dimer are swapped with each other between GCC and MCC (Fig. 5b), again illustrating the dramatic structural plasticity among these highly conserved enzymes.
Molecular basis for substrate preference. Based on the binding modes of CoA to MCC (ref. 19) and crotonyl-CoA to glutaconyl-CoA decarboxylase a (ref. 26), we built a model for the binding mode of geranyl-CoA in GCC (Fig. 6a). There is a large, mostly hydrophobic pocket at the bottom of the active site in GCC that can accommodate the larger geranyl group. The pocket is formed primarily by residues in helices a3, a4 and a5 in the N domain, with only minor contributions from residues in the C domain. Interestingly, the side chain of Phe191 in MCCb is located in this pocket and clashes with the second isopentenyl group of geranyl-CoA. The equivalent residue in GCC is Gly162. There are also differences in the main-chain atoms of the residues in this region, and MCCb contains a three-residue insertion here ( Supplementary Fig. 2). Overall, these structural differences reduce the size of the active site pocket in MCCb, thereby defining the molecular basis for why MCC is not active toward the larger geranyl-CoA substrate. Conversely, the larger pocket in GCC can accommodate both geranyl-CoA and 3-methylcrontonyl-CoA, and therefore it can be active toward both substrates.
To test these structural observations, we created the G162F mutant of GCCb and the F191G mutant of MCCb. The G162F mutation destabilized the GCC holoenzyme and it dissociated during gel filtration unless stabilized by higher salt concentrations. The activity of wild-type GCC toward the geranyl-CoA substrate (Fig. 6b) is about fourfold lower than that of wild-type MCC toward the 3-methylcrotonyl-CoA substrate (Fig. 6c). Wild-type GCC showed very weak activity towards 3-methylcrotonyl-CoA. The K m of this reaction is very high, suggesting that binding of the smaller CoA compound is not optimal in the GCC active site.
As predicted by the structure, the G162F mutant of GCC is catalytically inactive toward geranyl-CoA, while the F191G mutant of MCC showed appreciable activity towards this substrate, about fivefold lower than wild-type GCC (Fig. 6b). Conversely, the G162F mutant of GCC demonstrates stronger activity towards 3-methylcrotonyl-CoA than wild-type GCC, while the F191G mutant of MCC is essentially inactive (Fig. 6c).

Discussion
The kinetic parameters reported here for the wild-type enzymes are generally similar to those from previous studies on GCC and MCC. An apparent K m value of 50 mM was determined from our studies on PfGCC (Fig. 6b). However, only B25% of the geranyl-CoA compound we used was in the cis configuration. Therefore, the actual K m value is likely B12 mM, comparable to the 9 mM reported earlier for PaGCC (ref. 8). The K m value for PaMCC was 69 mM based on our studies (Fig. 6c), while the earlier study reported 10 mM as well as cooperative behaviour (Hill coefficient 2.3). However, in a more recent paper 27 , the K m was found to be 220 mM and no cooperativity was reported. These other kinetic studies are based on the CO 2 fixation assay, while our studies are based on the coupled enzyme assay monitoring ATP hydrolysis. Some of the differences in the kinetic parameters could also be due to the differences in the assay protocols.
Since GCC, MCC, and PCC have similar domain organizations (Fig. 1b) and conserved sequences, we have found that they are frequently misannotated in the sequence database. The structural information on the three holoenzymes suggests guidelines for how they can be classified based on their sequences. PCCb lacks an extended a-helix (B30 residues) at the N terminus and can be separated from GCCb and MCCb based on this. The mitochondrial targeting sequence of animal PCCb should be excluded from this consideration. MCCb has a Phe residue in the active site region of the N domain, and this residue is replaced a Gly in GCCb. MCCb also has a short insertion of 3 residues just before this Phe residue compared with GCCb. With these guidelines, it should be possible to accurately annotate the different enzymes.
Overall, our studies on GCC, MCC and PCC support the presence of two distinct lineages of biotin-dependent acyl-CoA carboxylases. The structural differences among these enzymes also have wide implications for the relationship between sequence conservation and structural similarity. The differences in substrate preferences between GCC and MCC are defined by differences in their sequences, and consequently structures, in the active site region.

Methods
Protein expression and purification. GCC holoenzymes were overexpressed using a bi-cistronic plasmid, with GCCa (untagged) placed downstream of GCCb (N-terminal His-tagged) in a similar fashion to the MCC and PCC holoenzymes 18,19 . The hexa-histidine tag was not removed for crystallization.
PaGCC holoenzyme was overexpressed in Escherichia coli BL21 Star (DE3) cells (Novagen) in the presence of 1 mM IPTG (Gold Biotechnology, Inc) to induce expression and incubated at 20°C overnight. The soluble protein was eluted from Ni-NTA beads (Qiagen) and further purified by gel filtration in a column buffer The purified protein was concentrated to 12 mg ml À 1 , supplemented with 5% (v/v) glycerol, flash-frozen with liquid nitrogen and stored at À 80°C. An SDS gel of this sample showed that the a subunit was present in significantly substoichiometric amounts. P. fluorescens GCC (PfGCC) holoenzyme was expressed in the same condition as the PaGCC holoenzyme and purified by gel filtration in a column buffer containing 20 mM Tris (pH 8.5), 200 mM NaCl and 10 mM DTT. The purified protein was concentrated to 20 mg ml À 1 , supplemented with 5% (v/v) glycerol, flash-frozen with liquid nitrogen and stored at À 80°C. An SDS gel of this sample showed stoichiometric amounts of both subunits.
Protein crystallization. Crystals of PaGCC b 6 hexamer were obtained by using the PaGCC holoenzyme protein sample (but with significantly substoichiometric amounts of the a subunit) with the hanging-drop vapour diffusion method. Crystals of PaGCC b subunit appeared after 2 weeks at 20°C from a precipitant solution containing 100 mM Tris (pH 8.0) and 60% (v/v) MPD. Crystals were flash-frozen in liquid nitrogen for data collection at 100 K. Interestingly, crystals obtained by using a pure PaGCC b 6 sample diffracted very poorly.
Crystals of the PfGCC holoenzyme were obtained using the microbatch-underoil method at 20°C. The protein (20 mg ml À 1 ) was mixed with the precipitant solution at a ratio of 2:1, which contained 100 mM Tris (pH 7.0), 2 M (NH 4 ) 2 SO 4 and 250 mM Li 2 SO 4 . Crystals appeared after 1 month, and were cryoprotected by supplementing the precipitant solution with 30% (v/v) ethylene glycol.
Data collection and structure determination. X-ray diffraction data sets were collected on an ADSC Q315 CCD at the X29A beamline and on a Pilatus 6 M detector at the X25 beamline of the National Synchrotron Light Source at the Brookhaven National Laboratory. The diffraction images were processed using the HKL package 28 . The data processing and refinement statistics are summarized in Table 1. The resolution limit for the PfGCC holoenzyme data was determined using the CC 1/2 criterion 29 .
Crystals of PaGCCb contained one b 3 trimer in the asymmetric unit. The structure was determined by the molecular replacement method with the program Phaser 30 , using the structure of MCCb as the model 19 . Crystals of the PfGCC holoenzyme contained half of the dodecamer (a 3 b 3 ) in the asymmetric unit. The structure was determined with the molecular replacement method using GCCb 3 and MCCa 3 as the search models. Structure refinement was carried out with the programs CNS 31 and Refmac 32 , and the program Coot 33 was used for manual model rebuilding.
CryoEM imaging. To prepare cryoEM grids, we applied 2 ml of purified PfGCC sample to thin continuous carbon films on lacey grids (Ted Pella Inc.) for 1 min, blotted for 12 s under 100% humidity and plunged the grid into liquid ethane with an FEI Vitrobot (ambient temperature in the Vitrobot chamber was 20°C). CryoEM images were collected at liquid nitrogen temperature in an FEI Titan Krios cryo electron microscope operated at 300 kV using parallel illumination. Before data collection, the microscope was carefully aligned, and beam tilt was minimized by coma-free alignment. Images were recorded on a Gatan K2 camera with the counting mode at a nominal magnification of Â 29,000 on microscope. The magnification on the camera was calibrated as Â 49,500 using a catalase crystal sample, giving a pixel size of 1.01 Å per pixel on specimen. The dose rate of the electron beam was set to B8 counts per pixel per s on the camera which corresponds to B10 e À Å À 2 s À 1 on specimen when including the electrons uncounted by the K2 camera. Image stacks were recorded at 4 frames per sec for 10 s. After drift correction with the UCSF software 34 , the first 12 frames of each image stack (movie) were merged to generate a final image with a total dose of B30 e À Å À 2 of the sample.
CryoEM image processing. A total of 41,271 particle images were picked from 1,149 images automatically with the Dog Picker program 35 . The under defocus values of these images were determined to be between 1.06 to 3.4 mm using CTFFIND 36 . Image processing and reconstruction were carried out using RELION 37 . Two-dimensional classification was first used to screen particles. After 25 iterations, 16 two-dimensional classes with 31,206 particles were picked from the total 30 classes. Three-dimensional (3D) classification was then performed with RELION to classify all particles into three groups. However, the reconstructions from these three groups are nearly identical. We therefore used all 31,206 particles for 3D refinement, which started from a featureless starting model generated by low-pass (60 Å resolution) filtering the structure of MCC (PDB code: 3U9T; ref. 19). The 32 symmetry of the particles was also applied in the calculation. After 16 cycles of 3D refinement, the resolution of GCC structure was estimated to be B5.6 Å using the 'gold-standard FSC' criterion with RELION. The density map was sharpened with a reverse B-factor of -250 Å 2 . Visualization and segmentation of density maps were carried out with UCSF Chimera 38 .
Mutagenesis and kinetic studies. Site-specific mutations in PfGCC and PaMCC were introduced with the QuikChange kit (Agilent Technologies) and expressed and purified under the same conditions as PfGCC (gel filtration buffer 20 mM Tris (pH 8.5), 200 mM NaCl, 5 mM DTT). The catalytic activity of PaMCC, PfGCC and their mutants was determined using a coupled enzyme assay, converting the hydrolysis of ATP to the disappearance of NADH (refs 39,40). The reaction mixture contained 100 mM HEPES (pH 8.0), 0.5 mM ATP, 8 mM MgCl 2 , 40 mM KHCO 3 , 2-500 mM 3-methylcrotonyl-CoA or crotonyl-CoA, 0.2 mM NADH, 0.5 mM phosphoenolpyruvate, 7 units of lactate dehydrogenase, 4.2 units of pyruvate kinase and 250 mM KCl. The absorbance at 340 nm was monitored for 5 min. Geranyl-CoA was synthesized chemically from geranic acid and CoA (Changchun Discovery Sciences, Ltd), and contained B25% cis isomer based on HPLC and proton NMR measurements. Based on the structure, the trans isomer is unlikely to be accommodated in the active site, and therefore could not serve as a substrate or an inhibitor of the enzyme.