Deciphering the late steps of rifamycin biosynthesis

Rifamycin-derived drugs, including rifampin, rifabutin, rifapentine, and rifaximin, have long been used as first-line therapies for the treatment of tuberculosis and other deadly infections. However, the late steps leading to the biosynthesis of the industrially important rifamycin SV and B remain largely unknown. Here, we characterize a network of reactions underlying the biosynthesis of rifamycin SV, S, L, O, and B. The two-subunit transketolase Rif15 and the cytochrome P450 enzyme Rif16 are found to mediate, respectively, a unique C–O bond formation in rifamycin L and an atypical P450 ester-to-ether transformation from rifamycin L to B. Both reactions showcase interesting chemistries for these two widespread and well-studied enzyme families.

The biosynthetic network of late rifamycin derivatives. R-SV can be oxidized to R-S spontaneously in the presence of dioxygen and divalent metal ions. The transketolase Rif15 is responsible for transferring a C 2 keto-containing fragment from a 2-ketose to R-S, giving rise to R-L. The P450 enzyme Rif16 catalyzes the transformation from R-L to R-O in the presence of NADPH, ferredoxin (Fdx), and ferredoxin reductase (FdR). Finally, R-O is nonenzymatically reduced to R-B by NADPH
Purified Rif16 appeared to be a functional P450 enzyme, as it had the expected red color and showed a signature peak at 450 nm in its CO-reduced difference spectrum ( Supplementary  Fig. 4). To reconstitute the in vitro activity of Rif16, we used two surrogate redox partner proteins to shuttle electrons from NADPH to the heme-iron reactive center for P450 catalysis: the ferredoxin seFdx (SynPcc7942_1499) and the ferredoxin reductase seFdR (SynPcc7942_0978), both of which are from the cyanobacterium strain Synechococcus elongatus PCC 7942 and were here expressed heterologously in E. coli and purified 22 . Against our expectations, Rif16 was not able to catalyze the conversion from R-SV to R-S, while R-S was readily reduced to R-SV by addition of NADPH alone ( Supplementary Fig. 5). Importantly, the hydroquinone R-SV was spontaneously oxidized to the quinone R-S by ambient O 2 , and this transformation was dramatically accelerated by the presence of divalent metal ions (e.g., Cu 2+ , Mn 2+ , etc.) ( Supplementary Fig. 6), similar to previously reported findings 23 . However, we cannot exclude the possibility that an oxidase might be responsible for enzymatic oxidation of R-SV into R-S in vivo. Taken together, our results suggest that Rif16, rather than performing a normal biooxidation, may catalyze an atypical P450 reaction in rifamycin biosynthesis.
Functional characterization of Rif15. We next evaluated the in vitro activity of Rif15a/Rif15b at a 1:1 ratio (i.e., the reconstituted Rif15 transketolase) in the presence of R-S and F-6-P as the potential C 2 keto acceptor and donor, respectively, with ThDP and MgCl 2 as cofactors. As predicted, Rif15 converted R-S into a different product with higher polarity than R-B, while single subunits (that is, either Rif15a or Rif15b alone) were not able to catalyze the same transformation. Additionally, we found that both ThDP and Mg 2+ were required for the catalytic activity of Rif15 (Fig. 2a, trace i-vi). This is unsurprising since the diphosphate moiety of ThDP is bound to the transketolase through a bivalent cation to form the catalytically active holo-enzyme from the apo-protein 24,25 . Protein sequence alignment of multiple transketolases shows that the residues involved in the interactions with ThDP and the metal ion are highly conserved regardless their origins and subunit organization modes ( Supplementary  Fig. 7).
High performance liquid chromatography-high resolution mass spectrometry (HPLC-HRMS) analysis revealed that the m/ z value of the product was 754.3069 ([M-H] -, deduced to be [C 39 H 48 NO 14 ] -) ( Supplementary Fig. 8), which is consistent with that of R-L or R-B in negative ion mode (calc. 754.3069). Since the retention time of this product was distinct from that of R-B, we suspected that the product here is R-L. Both the 1D ( 1 H, 13 Table 2) of the purified product were acquired, and spectral comparisons of the proton NMR data obtained from the product and the substrate R-S ( Supplementary Fig. 9) revealed a new set of CH 2 signals (δ H-39 Moreover, Xu-5-P, Ru-5-P, S-7-P, and DHA were also able to serve as alternative C 2 keto donors for Rif15, with Xu-5-P being optimal in terms of conversion ratios under the same conditions ( Supplementary  Fig. 14).
To examine whether R-SV is also a direct precursor of R-L as previously proposed 2 , similar Rif15 reactions were performed using R-SV as substrate, and, interestingly, we detected a small amount of R-L as a product (Fig. 2a, trace vii). In light of the spontaneous oxidation from R-SV to R-S by O 2 that we had observed in earlier assays (in aqueous solution in the presence of Mg 2+ , Supplementary Fig. 6), 2 mM ascorbic acid was added to the Rif15 reactions to protect R-SV from oxidation (Fig. 2a, trace viii, ix) 26 . Upon this addition, we no longer detected R-L as a reaction product (Fig. 2a, trace x), establishing that the previously detected R-L was actually derived from the spontaneously formed R-S.
Collectively, our results from these in vitro assays demonstrate that Rif15 is a two-component transketolase that transforms a ketone (in R-S) into an ester (in R-L), a reaction that has not been found previously in natural product synthesis, to the best of our knowledge. Mechanistically, the deprotonation of ThDP at the thiazolium ring generates a carbanion, which is responsible for cleaving the C-2/C-3 bond in the 2-ketose. The resultant ThDPbound dihydroxyethyl group then undertakes nucleophilic attack of the C-4 carbonyl carbon of R-S, which is followed by bond rearrangements and re-aromatization, ultimately yielding R-L (Fig. 3a).
Biochemical, structural, and mechanistic characterization of Rif16. Having characterized the R-SV→R-S→R-L transformation, we next sought to resolve the conversion of R-L into R-B. These The triangles indicate the carbon signals of residual glycerol derived from enzymatic reaction buffer two rifamycin derivatives have the same oxidation state, but we still chose to test the activity of Rif16 against R-L, since this P450 enzyme was previously shown to be required for R-B biosynthesis 17 . Indeed, we found that R-L was significantly converted into R-B by Rif16 in the presence of seFdx, seFdR, and NADPH ( Fig. 2b, trace xiii, xiv); the structure of the product was confirmed by the identical retention time of the product and the R-B authentic standard, co-elution with R-B in a co-injection experiment (Fig. 2b Fig. 15). These results clearly establish that Rif16 is the longsought R-B synthase of rifamycin biosynthesis.
To elucidate the catalytic mechanism for this atypical ester-toether transformation, the crystal structures of substrate-free Rif16 (PDB ID code: 5YSM, Fig. 4a) and R-L-bound Rif16 (PDB ID code: 5YSW, Fig. 4b) were solved at 1.90 Å and 2.60 Å resolution, respectively. In both of the structures, there was only one typical cytochrome P450 fold existing in an asymmetric unit. The BB' loop-B' helix-B'C loop region, which is known to be important for substrate specificity determination 27 , is significantly longer than those of many P450 enzymes that recognize smaller substrates ( Supplementary Fig. 16). The missing electron density of this region in both structures suggests the great structural flexibility. Both findings might help explain how Rif16 is able to accommodate its bulky substrate R-L, which represents one of the largest substrates for a P450 enzyme with the substrate-bound crystal structure available 28 . In the absence of substrate, Rif16 adopts an open conformation characterized by retraction of the F and G helices, loss of order in the B′ helix, and missing electron density for the B′C and FG loops (Fig. 4a). A water molecule that is 2.5 Å away from the heme-iron forms the sixth axial ligand of Fe 3+ (Supplementary Fig. 17a). Upon binding with R-L, Rif16's FG loop becomes ordered but the B' helix and the B′C loop remain disordered (Fig. 4b), thereby adopting a partially open conformation rather than the closed conformation observed in many substrate-bound P450 enzymes 29,30 (Fig. 4c).
In the substrate-bound structure ( Fig. 4d and Supplementary  Fig. 17b), R-L forms hydrogen bonds with residues S76, N108, F308, and T407, and additionally interacts with residues I107, A196, P197, I255, V303, P305, I307, F310, and L406 via hydrophobic interactions (all of these residues are within 5 Å of R-L). Critically, the axial water ligand is displaced and the hydroxyl group at C-39 of R-L is closest (4.5 Å) to the heme-iron reactive center. These structural features suggest a possible mechanism for R-B production (Fig. 3a): the ferryl-oxo species (Compound I) likely abstracts the hydrogen atom of the C-39 hydroxyl group, leading to formation of a substrate radical and the ferryl-hydroxo Compound II. The resultant oxygen radical can then directly attack the neighboring arene to form a fivemembered ring pendant, and the radical would be delocalized to the aromatic ring. Next, the relocation of the spirocyclic intermediate could induce the second hydrogen abstraction from the C-1 hydroxyl group by Compound II. This diradical mechanism might result in the formation of R-O. Notably, similar mechanisms-involving two alternative substrate binding poses being responsible for hydrogen abstractions from two distant sites-have been proposed for C-O coupling reactions catalyzed by a number of P450 enzymes 31,32 . Finally, the pentabasic cyclic compound R-O could be reduced to R-B (rather than R-L) by the NADPH-derived hydride, since the carboxylic acid is a better leaving group than the alcohol. This R-L→R-B conversion reaction lacks net oxidationreduction. To dissect this unusual P450 reaction experimentally, we elected to oxidize R-L by taking advantage of the peroxide shunt pathway of Rif16 18 , in which H 2 O 2 acts as the sole oxygen and electron donor of Rif16; this approach allowed us to bypass the dual roles of NADPH from our previous reaction system (its roles as an electron donor for the P450 enzyme and as a hydride provider for direct reduction of R-O). Interestingly, R-S was the dominant product from this reaction (Fig. 2b, trace xvi), which likely resulted from the spontaneous hydrolysis of the P450 product R-O 15,33 (Fig. 2b, trace xvii). The addition of NADPH into the Rif16/R-L/H 2 O 2 system led to predominant production of R-B (Fig. 2b, trace xviii, xix), as R-O can be reduced to R-B in the presence of NADPH (Fig. 2b, trace xx) 34,35 . Furthermore, the unstable compound R-O with the correct m/z value of 752.2920 ([M-H] -, calc. 752.2924) was observed in a time-course study ( Supplementary Fig. 18). These results strongly suggest that R-O is the intermediate that enables the conversion of R-L to R-B.
To validate our proposed enzymatic reaction mechanisms, we performed a series of 13 C-tracer NMR experiments. First, [39-13 C]R-L was prepared by mixing [1-13 C]glucose, ATP, Mg 2+ , hexokinase, G-6-P isomerase, Rif15a/Rif15b, ThDP, and R-S in a one-pot reaction. We observed that [1-13 C]glucose was phosphorylated to [1-13 C]G-6-P by hexokinase, which was subsequently transformed into [1-13 C]F-6-P by G-6-P isomerase (Fig. 3). The Rif15-mediated transfer of the 13 C-labeled glycolic acid C 2 moiety from [1-13 C]F-6-P to R-S resulted in production of [39-13 C]R-L, with an enriched C-39 signal of δ C 62.4 (Fig. 3b). The identity of this product was further confirmed by LC-HRMS analysis indicating an m/z value of 755.3106 ([M-H] − , calc. 755.3105, Supplementary Fig. 19), which is~1 Da greater than that of unlabeled R-L [M-H] − = 754.3069). Next, Rif16, seFdx/ seFdR, and NADPH were added into the above one-pot reaction.  Supplementary Fig. 19) and by our observation that the 13 C-labeled carbon signal shifted downfield from δ C 62.4 to δ C 67.9 (Fig. 3b); both analytical results are consistent with the conversion of R-L to R-B via R-O (Fig. 3a).
It was previously reported that the R-SV high-producer A. mediterranei U32 has an R84W single mutation in Rif16 16 . The understanding of Rif16 mechanism allowed us to rationalize this industrially important phenotype. Specifically, the dissociation constant (K d ) of R-L toward Rif16 was determined to be 1.3 ± 0.1 μM (Supplementary Fig. 20), while the purified Rif16 R84W mutant ( Supplementary Figs. 3 and 21) showed no detectable binding of R-L and lost the ability of catalyzing the transformation from R-L to R-B ( Supplementary Fig. 22). Since R84 is located at the B' loop of Rif16 ( Supplementary Fig. 16), which is an important region for P450 substrate recognition 27 , its replacement by a tryptophan abolishes the productive substrate binding via a mechanism to be elucidated. Furthermore, according to the biosynthetic network shown in Fig. 1, the U32 mutant should accumulate R-L instead of the observed R-SV and R-S 16 . We reason that the ester R-L might be unstable, which could be hydrolyzed to R-SV either enzymatically or spontaneously ( Supplementary Fig. 23).
Our elucidation of the network comprising the late steps of rifamycin biosynthesis revealed a unique C-O bond formation reaction mediated by a transketolase that involves both normal C-C bond formation and unusual bond rearrangements. Notably, transketolases primarily participate in central metabolic pathways such as pentose phosphate pathway and the Calvin cycle, and there have been few reports on transketolases that are involved in natural product biosynthesis 36,37 . The ether bond formation derived from the concomitant oxidation-reduction reactions and complex bond rearrangements also represents a highly atypical reaction for a P450 reaction system. The knowledge on the slow kinetics and the optimal C 2 keto donor of Rif15 could also help direct the future rational strain improvement. Finally, BLAST searches demonstrate that there exist other protein sequences with high similarity to Rif15 and Rif16 (Supplementary Table 3, Supplementary Fig. 24), suggesting that more Rif15-like and Rif16-like functionality could be further identified. Some of these enzymes come from rifamycin producing microorganisms [38][39][40][41] , which may suggest an effective method for discovery of more rifamycin producers by using Rif15 and Rif16 sequences as probes.

Methods
Chemicals. Rifamycin SV and rifamycin O authentic standards were purchased from Sigma Aldrich (USA) and Toronto Research Chemicals (Canada), respectively. Rifamycin S and rifamycin B authentic standards were bought from National Institutes for Food and Drug Control (China).
General DNA manipulation. The E. coli DH5α strain was used for plasmid construction, storage, and isolation. Fast-digest restriction endonucleases (Thermo Fisher Scientific, USA) and T4 DNA ligase (Takara, Japan) were used for construction of vectors. PCR reactions were performed using I-5 TM 2 × High-Fidelity Master Mix DNA polymerase (TsingKe Biotech, Beijing, China). Plasmid isolations from E. coli cells were performed using the Plasmid Miniprep Kit (TsingKe Biotech, Beijing, China). Purification of DNA fragments from agarose gels or PCR reactions was carried out using Gel Extraction Kit (Omega, USA) and Cycle Pure Kit (Omega, USA), respectively. Primers were synthesized by TsingKe (China).
Protein concentration determination. For Rif16 and Rif16 R84W , the UV-visible spectra were recorded on a DU 800 spectrophotometer (Beckman Coulter, USA). The CO-bound reduced difference spectrum was employed to determine the functional concentration of P450 enzymes using the extinction coefficient (ε 450nm-490nm ) of 91,000 M -1 ·cm -143 . Briefly, CO was slowly bubbled through the Na 2 S 2 O 4 reduced P450 enzyme solution using a Pasteur pipette in a fume hood. The spectra of ferric, CO-bound, and CO-bound reduced forms of the P450 enzyme were recorded between 250 and 550 nm for generation of the CO-bound reduced difference spectrum. The protein concentrations of other proteins were determined using the Bradford assay with bovine serum albumin as standard 44 .
Spectral substrate binding assays. Spectral substrate binding assays were carried out on a UV-visible spectrophotometer 50 Bio (Cary, USA) at room temperature by titrating 100 μM rifamycin L DMSO solution (blank DMSO for the reference group) into 1 mL of 1 μM Rif16 or Rif16 R84W solution in 1 μL aliquots, leading to the substrate concentrations ranging from 0.1 to 1.2 μM. The series of Type I difference spectra were used to deduce ΔA (A peak(390 nm) -A trough(420 nm) ). Then, the ΔA data versus substrate concentrations were fit to Michaelis-Menten equation to calculate the dissociation constant K d 45 .
In vitro enzymatic assays of Rif15 and Rif16. The Rif15 reaction mixture con- Kinetic analysis of Rif15. The kinetic analysis of Rif15 was carried out using 2-10 μM Rif15a/Rif15b, 10-100 μM R-S as substrate, 2 mM F-6-P as the C 2 keto donor, 2.5 mM MgCl 2 , and 0.5 mM ThDP in 100 μL of reaction buffer. The reactions were performed 15-30 min at 28°C and quenched by mixing with the same volume of methanol. After high-speed centrifugation (20,000 ×g) for 15 min, the supernatants were analyzed on an Agilent 1260 infinity HPLC system as described above. Each peak area of R-L was used to calculate the product concentration based on the standard curve of R-L. The triplicated data were fit to the Michaelis-Menten equation to determine the k cat and K m values using Origin 9.0.
Preparation of 13 C labeled F-6-P, R-L, and R-B. Isolation and purification of R-L and R-B. The scaled-up Rif15 (for R-L) and Rif16 (for R-B) reactions (20-100 mL) were respectively extracted by the same volume of ethyl acetate five times. Then, the extracts were dried under nitrogen flow, and re-dissolved in methanol. The purification of R-L or R-B was performed using semi-preparative HPLC (Waters XBridge TM Prep C18 5 μm, 10 × 250 mm) with a linear gradient of 40-80% acetonitrile in ddH 2 O + 0.1% trifluoroacetic acid over 20 min, and then 100% acetonitrile for 5 min at a flow rate of 2.5 mL/min. The collected fractions containing R-L or R-B were combined. The solvents were removed using a Rotavapor R-3 rotary evaporator (Buchi, Switzerland) and N 2 blowing. Finally, 10.0 mg R-L and 3.0 mg R-B with > 98.0% purity were obtained.
LC-HRMS analysis. The LC-HRMS analysis was performed on a Waters symmetry column (4.6 × 150 mm, RP18) using the negative-mode electrospray ionization with a linear gradient of 10-100% acetonitrile in ddH 2 O with 0.1% formic acid over 20 min, and followed by 100% acetonitrile for 5 min at a flow rate of 0.5 mL/min. The high resolution mass spectra were recorded on a Dionex Ultimate 3000 coupled to a Bruker Maxis Q-TOF.
NMR analysis. The 13 C NMR spectra of 13 C-labeled glucose, G-6-P, and F-6-P were acquired using D 2 O as solvent. The 1 H, 13 C and 2D NMR spectra of rifamycin derivatives were obtained using CDCl 3 or MeOD as solvent on a Bruker Avance III 600 MHz spectrometer with a 5 mm TCI cryoprobe.
Crystallization and structure determination. The crystal of Rif16 alone and the complex crystal of Rif16 and R-L were obtained at 16°C by hanging drop vapor diffusion. The native Rif16 crystal screen droplets consisted of a 1:1 (v/v) protein at 16 mg/mL and the well solution of 100 mM Bis-Tris, pH 6.5, 200 mM magnesium chloride hexahydrate, and 25% PEG3350. Rif16 and R-L were mixed at a molar ratio of 1:5 and the co-crystallization was carried out in 200 mM magnesium chloride hexahydrate, 100 mM Tris, pH 8.5, 30% PEG4000. Crystals appeared after 1 week and were ready for data collection in 20 days. The crystals were flash-cooled in liquid nitrogen. The diffraction data were collected at 100 K under the synchrotron radiation at beamline BL19U1 of the Shanghai Synchrotron Radiation Facility (SSRF). The data sets were integrated and scaled with the HKL3000 package 48 . The structure is determined by molecular replacement with the structure of CYP105A (PDB accession code 4OQS) as the initial search model with the program Phaser. The programs Refmac5 and Coot9 were used for the refinement and model building [49][50][51][52] . Ramachandran plots were generated with Coot9. The statistics for data processing and structure refinement are shown in Supplementary