Enzymatic reaction mechanism of cis-aconitate decarboxylase based on the crystal structure of IRG1 from Bacillus subtilis

Itaconate, which is formed by decarboxylation of cis-aconitate—an intermediate metabolite in the tricarboxylic acid cycle—has been used as a building block in polymer synthesis and is an important chemical in several biomedical and industrial applications. Itaconate is an immunometabolite with antibacterial, antiviral, immunoregulatory, and tumor-promoting activities. Recent focus has been on the role of itaconate in the field of immunology, with immune-responsive gene 1 (IRG1) being identified as the cis-aconitate decarboxylase responsible for itaconate production. We solved the structure of IRG1 from Bacillus subtilis (bsIRG1) and showed that IRG1 adopts either a closed or an open conformation; bsIRG1 was in the open form. A1 and A2 loops around the active site are flexible and can control the formation of the open and closed forms of IRG1. An in silico docking simulation showed that only the open form of IRG1 can accommodate the substrate. The most energetically favorable position of cis-aconitate in the active site of bsIRG1 involved the localization of C2 and C5 of cis-aconitate into the H102 region and H151 region of bsIRG1, respectively. Based on the structural study of bsIRG1, compared with IDS epimerase, and in silico docking simulation, we proposed two tentative enzymatic reaction mechanisms of IRG1, a two-base model and a one-base model.


Results
Overall structure of IRG1 from Bacillus subtilis. At the initial stage of the structural study of the IRG1 family, we selected IRG1 orthologs from three different species, human (NCBI reference sequence ID: NP_001245335.1), mouse (NCBI reference sequence ID: NP_032418.1), and Bacillus subtilis (Gene bank ID: ARW33836.1). Although all three genes were overexpressed and their proteins purified, only full-length IRG1 from B. subtilis containing residues 1-445 (hereafter called bsIRG1) was crystallized, and the structure was solved. The 1.78 Å crystal structure of full-length bsIRG1 was elucidated by using the molecular replacement (MR) method. The previously reported structural homolog, IDS epimerase (PDB ID: 2HP0) sharing 28% sequence identity with bsIRG1 26 , was used as an initial search model for MR. The structure was refined to R work = 17.7% and R free = 20.8%. The crystallographic and refinement statistics are summarized in Table 1. There were two molecules in the asymmetric unit, chains A and B, both constructed from residues 3 to 445 (Fig. 1b). The extra sequence at the C-terminus (leucine and glutamic acid), which was derived from the plasmid construct, was also included in the final model. The structures of the two chains in the same asymmetric unit were nearly identical, having a root-mean-square deviation (RMSD) of 0.58 Å (Fig. 1c).
The structure of bsIRG1 revealed that it consisted of two distinct domains, a helical domain and a lid domain (Fig. 1d). The helical domain contained 260 residues at the N-terminal and the last 40 residues at the C-terminus, while the lid domain contained approximately 140 residues in the middle of bsIRG1 (Fig. 1d). Because the working stoichiometry of the IRG family has not been studied, and there were two symmetric molecules in the crystallographic asymmetric unit, multi-angle light scattering (MALS) was used to analyze the absolute molecular mass of bsIRG1 in solution. The experimental molecular mass of bsIRG1 in solution determined by MALS was 93.4 kDa (0.48% fitting error). As the calculated molecular weight of the monomeric full-length bsIRG1 (residues 1 to 445), including the C-terminal His-tag, was 47.7 kDa, the peak generated by the dimer indicated that the working stoichiometry of bsIRG1 is that of a dimer in solution (Fig. 1e).
During structural refinement, we observed an unknown electron density on the opposite side of the active site, whose length was around 10 Å (Fig. 1f). Based on this electron density, we searched for a putative molecule using the ligand identification tool in the PHENIX program 27 . This search identified 3-cyclohexyl-1-propylsulfonic acid (CXS) as the molecule that fitted well into the electron density; hence, CXS was included in the model (Fig. 1f). Because CXS was not used for the purification and crystallization steps, endogenous CXS might be incorporated into bsIRG1 in the bacteria during expression. The CXS binding pocket was formed between the lid domain and the helical domain on the opposite side of the active site, and CXS fitted well into the binding pocket (Fig. 1g). The solvent-accessible areas of CXS and bsIRG1 formed by the interaction of CXS with the binding pocket of bsIRG1 were 300 Å 2 (80% of CXS) and 200 Å 2 (1.1% of bsIRG1), respectively. The main forces involved in CXS binding to bsIRG1 were hydrogen bonds formed between the sulfonic acid moiety of CXS and Y269, R283, and L339 of bsIRG1, and hydrophobic interactions formed between the cyclohexyl group of CXS and the hydrophobic pocket formed by F93, L408, and L419 of bsIRG1 (Fig. 1h).  Fig. 2a) 26 . Among the three different isomers of IDS, R,R isomer, R,S isomer, and S,S isomer, the R,R-isomer and S,S-isomer are converted by IDS epimerase into the R,S-isomer (Fig. 2b) 28 . To understand the working mechanism of IRG1, the structure of bsIRG1 was compared with that of IDS epimerase (PDB ID: 2HP0) 26 . Although the overall structures are similar to each other, both consisting of a helical domain and a lid domain with a root mean square deviation (RMSD) of 2.3 Å, the lid domain of bsIRG1 was tilted by approximately 8° compared to the lid domain of IDS epimerase (Fig. 2c). The most distinct structural difference between the two proteins was the location of the A2 loop, which is one of two loops (A1 loop and A2 loop) located around the active site (Fig. 2c). The A2 loop of bsIRG1 was localized away from the active site, while the A2 loop of IDS epimerase was close to the active site. This indicates that the A2 loop might be flexible in the MmgE/PreD protein family and control the open and closed conformations of this enzyme family (Fig. 2c). The charge distribution and surface features of the two proteins are similar each other, especially in the active site. A deep cavity with basic residues was detected in the active site of both proteins (Fig. 2d). This basic character of the active site could help to accommodate negatively charged carboxyl groups from the substrates, IDS for IDS epimerase and cis-aconitate for bsIRG1. The amino acid residues in the active site are completely conserved between IDS epimerase and bsIRG1, although the positions of the side chains of R100 and Y146 were not identical (Fig. 2e). The dimeric form of IDS epimerase, the functional unit of this enzyme, was also compared with the current dimeric structure of bsIRG1. The structure of the bsIRG1 dimer was almost identical to that of IDS epimerase dimer with an RMSD of 2.5 ( Fig. 2f). Both dimeric structures used the same dimerization interface to form a stable dimer (Fig. 2f).

Structural comparison between IRG1 from different species.
During the preparation of the manuscript reporting the analysis of the structure of bsIRG1, structures of IRG1 from different species, including mouse (mIRG1) and human (hIRG1), were reported by Pessler and co-workers 29 . bsIRG1 shares sequence identity of 25% with mIRG1 and 24% with hIRG1 ( Fig. 3a). To compare the structures of bsIRG1 with those of mIRG1 and hIRG1, the structure of bsIRG1 was superposed on the structures of mIRG1 and hIRG1. Although the overall structure of bsIRG1 was almost identical to those of the other species, with an RMSD of 2.5 Å with mIRG1 and 2.6 Å with hIRG1, the lid domain of bsIRG1 was tilted by around 12.5° (Fig. 3b). Similar to the observation with IDS epimerase, the A2 loop of hIRG1 and mIRG1 showed a closed conformation by its location toward the active site, while the A2 loop of bsIRG1 was wide open in its location away from the active site (Fig. 3b,c). Another distinct structural difference was observed with the connection between the A1 loop and the A1 helix. The structure of this region was located toward the active site with the A2 loop in hIRG1, whereas the same region of bsIRG1 was localized far from the active site (Fig. 3c). Because of the close localization of both the A2 loop and A1 loop connected with the A1 helix around the active site, even covering the active site in the case of hIRG1, the previously reported structure of hIRG1 was considered as a closed form of the IRG1 family. In contrast, the same region of bsIRG1 was dislocated, leaving the active site open, which indicated that our structure might be the open conformation of the IRG1 family. In the case of mIRG1, the model of the A1 loop connected with the A1 helix failed due to the lack of electron density 29 , indicating that this region might be flexible and function as a gate to control the open and closed states.
The closed and open conformations of IRG1 were more distinctly observed in the comparison of surface charge distribution of bsIRG1 and hIRG1. The deep, positively charged cavity that accommodated the negatively charged substrate, cis-aconitate, in the active site of bsIRG1 was clearly observed in the surface analysis, while the shallow, negatively charged active site covered by the A2 loop and the A1 loop connected with the A1 helix was observed in hIRG1 (Fig. 3d). These observations indicate that the structure of bsIRG1 is an open form, Table 1. Data collection and refinement statistics. RMSD root-mean-square deviation. a Values for the outermost resolution shell in parentheses is the observed intensity of reflection h, and < I(h) > is the average intensity obtained from multiple measurements. www.nature.com/scientificreports/ whereas that of hIRG1 is a closed form. Most of the amino acid residues in the active site are conserved between bsIRG1 and mammalian IRG1 (Fig. 2e). However, unlike the complete conservation of amino acid residues in the active sites of bsIRG1 and IDS epimerase, H277 and Y318 in hIRG1 as well as Y319 in mIRG1-important for the activity of IRG1 in the previous study 29 -are not conserved in bsIRG1 (Fig. 3e).
Because the structure of mammalian IRG1 was dimeric, these dimeric structures were compared with the dimeric structure of bsIRG1 by superposition analysis. The dimeric structure of bsIRG1 was almost identical to that of the mammalian dimer with an RMSD of 2.7 with hIRG1 and 2.8 with mIRG1 (Fig. 3f). All the dimeric structures of IRG1 used the same dimerization interface to form a stable dimer (Fig. 3f).
Putative catalytic mechanism of itaconate production by IRG1. To understand the catalytic mechanism of IRG1-catalyzed production of itaconate from cis-aconitate, we applied our efforts into solving the enzyme/substrate complex structure. However, this effort was not successfully finished; instead, a docking simulation study using MAESTRO docking software was undertaken. An in silico docking simulation of cisaconitate and bsIRG1 indicated that C1 and C6 carboxylate groups of cis-aconitate form an extensive network of hydrogen bonds with D91, H102, K200, K266, and C271 from bsIRG1, while C5 carboxylate forms hydrogen bonds only with H151 from bsIRG1 (Fig. 4a). Because it has been reported that C5 carboxylate is a leaving group www.nature.com/scientificreports/ in the decarboxylation reaction of IRG1 30 , few interactions of the C5 carboxylate with the enzyme will probably facilitate the dissociation of itaconate after the enzymatic reaction of IRG1. In this simulation, Y146, which is not conserved in the mammalian IRG1, was involved in the interaction with the C6 carboxylate. In hIRG1, Y318 performed the role of Y146 of bsIRG1 by interacting with the C6 carboxylate of the substrate (Fig. 4a). H102, which is known to be the most important residue in the IRG1 reaction, was positioned near the C2 region of the substrate where the protonation occurs, indicating that H102 of bsIRG1 (H103 of hIRG1 and mIRG1) might be the basic residue responsible for the protonation and decarboxylation of the substrate (Fig. 4a) 30 . Based on the structural study of bsIRG1, comparison with IDS epimerase, and the docking simulation, we proposed two tentative enzymatic mechanisms for IRG1. In one mechanism, two bases from the active site of IRG1 are involved in the decarboxylation of cis-aconitate. One base facilitates the deprotonation of the C5 carboxyl group, and the other base facilitates the protonation of C2 of the substrate accompanied by the departure of carbon dioxide (CO 2 ) (Fig. 4b). This reaction produces itaconate and CO 2 . Another possible mechanism involves only one base from the enzyme, which is involved in both the deprotonation and decarboxylation of cis-aconitate (Fig. 4b). In this case, the responsible base residue might be H102 of bsIRG1 (H103 of hIRG1 and mIRG1), which is the most critical residue for the activity of IRG1. Based on the structural comparison of bsIRG1 with IDS epimerase, mIRG1, and hIRG1, we noticed that bsIRG1 was an open form, whose two loops (A1 and A2) were localized to expose the active site. Structural comparison showed that A1 and A2 loops are flexible and control substrate accessibility by working as a gate. The comparison also showed that the lid domain could move to adjust the active site. We also realized that the active site of IRG1, which has decarboxylase activity, and the active site of IDS epimerase, which has epimerization activity, were very similar with respect to the presence of two putative working bases. For IDS epimerase, H99 and Y145 have been suggested as two basic residues for the epimerization reaction 26 . Interestingly, these two residues are conserved only in bsIRG1, but not in mammalian IRG1. In mIRG1 and hIRG1, although H99 is conserved (H103 in mIRG1 and hIRG1), Y145 is not. Unlike the complete conservation of amino acid residues in the active sites of bsIRG1 and IDS epimerase, H277 and Y318 in hIRG1 (Y319 in mIRG1), which were previously shown to be important for the activity of IRG1 29 , are not conserved in bsIRG1. Structural comparison showed that the location of the side chain of Y318 in hIRG1 is similar to that of the side chain of Y146 in bsIRG1, indicating that Y146 of bsIRG1 might share the function of Y318 in hIRG1.
To understand the catalytic mechanism of IRG1, a docking simulation was performed that showed that cisaconitate fit well in the active site of only bsIRG1, but not hIRG1 when the active sites were selected for generating www.nature.com/scientificreports/ the receptor grid. The inability of the substrate to dock in the active site of hIRG1 might be due to its closed conformation, which did not allow the substrate access to the active site. Based on the structural study of bsIRG1, comparison with IDS epimerase, and in silico docking simulation, we proposed two tentative enzymatic mechanisms of IRG1, one involving two bases and the other involving one base. In the two-base model, one base facilitates the deprotonation of the C5 carboxyl group, while the other base is involved in the protonation of C2 of the substrate by facilitating the departure of CO 2 and producing itaconate. Taking into account the position of the side chain, the leaving group (C5 carboxylate), and the place of protonation (C2 region), the bases responsible for the deprotonation of the C5 carboxyl group and protonation of C2 might be H159 and H103, respectively. These two important residues are completely conserved in the different species (Fig. 3a) and are critical for the activity of IRG1 29 . However, if IRG1 were to use only one base for decarboxylation, H103 might be responsible for the protonation and deprotonation processes (Fig. 4b) because this residue was found to be the most important residue for IRG1 activity 29 . Moreover, H99 was the base considered to be critical for the reaction of the same fold protein, IDS epimerase 26 . Thus, H99 of IDS epimerase and H103 of hIRG1 might be the responsible bases for epimerization and decarboxylation, respectively. This base can work alone or work together with another neighboring base. The structure of the substrate/IRG1 complex has to be determined to elucidate the precise working mechanism of IRG1. crystallization and data collection. For initial crystallization, 1 µL of protein solution was mixed with an equal volume of reservoir solution, and the droplet was allowed to equilibrate against 300 µL of the mother liquor using the hanging drop vapor diffusion method at 20 °C. The initial hit was obtained from a buffer comprising 2.0 M (NH 4 ) 2 SO 4 , 0.1 M cacodylate (pH 6.5), and 0.2 M NaCl. The crystallization conditions were further optimized and finally adjusted to a buffer composition of 1.5 M (NH 4 ) 2 SO 4 , 0.1 M CAPS (pH 9.8), and 0.2 M Li 2 SO 4 . Qualified crystals appeared in 1 day and grew to a maximum size of 0.2 × 0.5 × 0.2 mm 3 . For data collection, the crystals were soaked in the mother liquor supplemented with 40% (v/v) glycerol as a cryoprotectant and flash-cooled in a stream of N 2 at − 178 °C. X-ray diffraction data were collected at the Pohang Accelerator Laboratory with the 5C beamline (Pohang, Republic of Korea) at a wavelength of 1.0000 Å. The diffraction data were indexed, integrated, and scaled with the HKL-2000 program 31 .

Methods
Structure determination and analysis. The structure was determined by the molecular replacement (MR) phasing method using PHASER 32 . The previously solved structural homolog IDS epimerase (PDB ID: 2HP0), which shares 28% sequence identity with bsIRG1, was used as a search model 26 . The initial model was built automatically with AutoBuild in PHENIX 27 and completed with Coot 33 . Model refinement was iteratively performed using phenix.refine in Phenix 27 . The quality of the model was validated using MolProbity 34 . All the structural figures in this paper were generated using the PyMOL program 35 .
SEC-MALS analysis. The absolute molar mass of bsIRG1 in solution was determined by MALS. The target protein filtered with a 0.2 µm syringe-filter was loaded onto a Superdex 200 10/300 gel-filtration column (GE Healthcare) that had been pre-equilibrated in buffer comprising 20 mM Tris-HCl (pH 8.0) and 150 mM NaCl. The mobile phase buffer flowed at a rate of 0.4 mL/min at room temperature. A DAWN-treos MALS detector (Wyatt Technology, Santa Barbara, USA) was connected with the ÄKTA explorer system (GE Healthcare). The molecular mass of bovine serum albumin was used as a reference value. Data for the absolute molecular mass were assessed using the ASTRA program (Wyatt Technology).
Sequence alignment. The amino acid sequences of bsIRG1 from different species were analyzed using Clustal Omega (https ://www.ebi.ac.uk/Tools /msa/clust alo/). Scientific RepoRtS | (2020) 10:11305 | https://doi.org/10.1038/s41598-020-68419-y www.nature.com/scientificreports/ in silico molecular docking simulation. In silico docking calculations was performed using GLIDE in MAESTRO program 36 on a Linux workstation. cis-aconitate and bsIRG1 were used as docking ligand and receptor, respectively. At the initial step for the docking process, the ligand (cis-aconitate) was prepared by LigPrep module, which performed geometrical refining of chemical structure of cis-aconitate and setting up 3D structure with accurate chirality. For the receptor protein preparation, Protein preparation wizard of MAESTRO program was used. Chain A of the solved bsIRG1 structure was selected and edited for missing hydrogens and for assigning proper bond orders. The edited structure was minimized to the default RMSD value. A grid box centered on the residue H102 (H103 in hIRG1) was generated using the default Glide settings. cis-aconitate is docked into this defined grid box of the receptor. The constraint to ligand-receptor interaction was not set.
Structural data accession number. Coordinate and structural factor were deposited in the Protein Data Bank under PDB ID: 7BRA.