Cryptic indole hydroxylation by a non-canonical terpenoid cyclase parallels bacterial xenobiotic detoxification

Terpenoid natural products comprise a wide range of molecular architectures that typically result from C–C bond formations catalysed by classical type I/II terpene cyclases. However, the molecular diversity of biologically active terpenoids is substantially increased by fully unrelated, non-canonical terpenoid cyclases. Their evolutionary origin has remained enigmatic. Here we report the in vitro reconstitution of an unusual flavin-dependent bacterial indoloterpenoid cyclase, XiaF, together with a designated flavoenzyme-reductase (XiaP) that mediates a key step in xiamycin biosynthesis. The crystal structure of XiaF with bound FADH2 (at 2.4 Å resolution) and phylogenetic analyses reveal that XiaF is, surprisingly, most closely related to xenobiotic-degrading enzymes. Biotransformation assays show that XiaF is a designated indole hydroxylase that can be used for the production of indigo and indirubin. We unveil a cryptic hydroxylation step that sets the basis for terpenoid cyclization and suggest that the cyclase has evolved from xenobiotics detoxification enzymes.

). Therefore, its influence was neglected in the calculation of the kinetic parameters. As an example for the determination of initial velocities (v 0 ) Supplementary Figure Figure   4C and 4D).

Supplementary Note 3. Detailed Phylogenetic Analysis of XiaF
Homology searches (BLAST) resulted in a diverse set of (28) XiaF homologs ( Figure 2B), most of which have been classified as group D flavin-dependent monooxygenases (FMOs). Various group D FMOs have been shown to catalyze aromatic hydroxylation and N-hydroxylation reactions 14 . Unexpectedly, the majority of XiaF homologs is involved in the catabolism of xenobiotics, and not in secondary metabolite biosynthetic pathways, and shares the ability to catalyze indole hydroxylation to yield indigo with the indosespene cyclase XiaF 1,2,[15][16][17][18][19][20][21][22][23][24][25] . Notably, none of them has been reported to use indole as its 'natural' substrate. Instead, indole is reported to be accepted as an alternative substrate or to be co-metabolized together with the natural substrates 1,2,16,[18][19][20][21][23][24][25] . However, the concept to distinguish between natural and alternative substrate for promiscuous xenobiotic-metabolizing enzymes is questionable, especially in the light of recent studies, which suggest that the conversion of indole to indigo is a detoxification mechanism of microorganisms to deal with otherwise toxic indole [26][27][28] .
Among the relatives of XiaF there are only four enzymes that have been implicated in tailoring of secondary metabolites. However, none of them has been reported to catalyze the production of indigo as an alternative reaction, or to be evolutionarily related to xenobiotic-metabolizing enzymes.
To investigate the phylogenetic position of XiaF we constructed a phylogenetic tree of all 30 sequences using the Neighbor-Joining method ( Figure 2B). XiaF forms a clade together with its orthologs from S. sp. SCSIO 02999 (XiaI) and A. nigrescens CSC17Ta-90 (XiaF' #4), as well as with six orthologs from other putative xia gene clusters from various actinomycetes. This subclade is part of a bigger clade including the indigo-forming oxygenases pJEC, AcadA pB7-2 and IpoA. The former two were identified in the course of soil metagenome screening projects. Whereas pJEC was discovered because of its ability to form indigoid pigments 15 , AcdA pB7-2 was identified from a soil sample that was artificially polluted with aromatic compounds (biphenyl, phenanthrene, carbazole, and 3chlorobenzoate) 24 . Such metagenome screenings, which make use of the fact that several types of oxygenases are able to produce indigo when heterologously produced in suitable hosts, are frequently used to find potential biocatalysts for the oxygenation of aromatic compounds 23,24,29 . IpoA has been proposed to be involved in limonene degradation, since ipoA expression together with indigo formation is specifically induced when administering limonene 25 . Likewise, the induction of AcadA pB7-2 and IpoA production by xenobiotics strongly suggests their involvement in xenobiotic metabolism, although their natural substrates are still unknown. Another clade is constituted by secosteroid hydroxylases from Mycobacterium tuberculosis (HsaA #1) and Rhodococcus sp. Rha1 (HsaA #2), which catalyze an important step in the degradation sequence of cholesterol 3,4 , and the 4-hydroxyphenylacetate 3-hydroxylase C2-HpaH 5 . Notably, these oxygenases are closely related to the bifunctional hydroxylase ActVA-ORF5, which is involved in the biosynthesis of actinorhodin 30 and NcnH, which is presumably involved in naphthocyclinone biosynthesis 31 . All above-mentioned enzymes seem to be related to the phenol monooxygenase PheA 16 . All together, it appears that various secondary metabolite tailoring enzymes share an evolutionary origin with enzymes involved in xenobiotic-metabolizing/detoxification pathways, yet this relationship has been overlooked for them. In addition to these polyketide hydroxylases, XiaF homologs also comprise N-oxygenases KijD3 32 and DnmZ 7 involved in the biosynthetic pathways to kijanimicin 32 , and baumycin 7 , respectively.
All homologs of XiaF have been functionally characterized to a different extent except of the putative XiaF orthologs of other putative xia gene clusters that are uncharacterized yet.
PheA from Geobacillus stearothermophilus has been shown in vivo and in vitro (crude enzyme extract) to catalyze the conversion of phenol to catechol in the degradation pathway of phenol.
Furthermore PheA has been reported to catalyze the conversion of indole to indigo and also to be active on cresols 16  IacA from Acinetobacter baumannii ATCC 19606 (IacA #1) has been shown in vitro to catalyze the conversion of indole to indigo. Furthermore, a mutagenesis study indicated that IacA #1 is involved in the catabolism of the plant hormone indole 3-acetic acid (IAA), which is used as an energy source 21 .
Crude extracts from E. coli overexpressing iacA have been shown to oxidize IAA 21 and 2-hydroxy-IAA has been suggested to be the product of the IacA reaction 37 .
IacA from Pseudomonas putida (IacA #2) has been shown in vivo (in E. coli) to catalyze the production of indigo. Furthermore, IacA has been suggested to be involve in the catabolism of IAA. 19 IdoA from Pseudomonas alcaligenes PA-10 has been shown in vivo (in E. coli) to catalyze the production of indigo. Furthermore, the polycyclic aromatic hydrocarbon (PAH) fluoranthene has been suggested as a substrate of IdoA after a idoA gene deletion mutant had lost the ability to degrade this compound 18 .
IpoA from Rhodococcus sp. T104 has been shown in vivo (in E. coli) to catalyze the production of indigo. Moreover IpoA has been suggested to take part in the degradation of limonene since ipoA expression together with indigo formation is specifically induced by feeding of limonene 25 .
IcpA cloneM103 and IcpA cloneM123 , which have been discovered in a metagenomic approach aiming to find biocatalytic enzymes from samples of activated sludge used to treat coke plant wastewater, have been shown in vivo (in E. coli) to catalyze the production of indigo. Furthermore IcpA cloneM103 has been shown to catalyze the hydroxylation of 4-nitrotoluene to 4-nitrobenzyl alcohol in vivo (in E. coli) 23 .
Both IcpA have been suggested to work together with the flavin reductase IcpB 23 .
C2-HpaH from Acinetobacter baumannii has been characterized in vitro 5 . Moreover its crystal structure has been solved 38 . This enzyme catalyzes the hydroxylation of 4-hydroxyphenylacetate at C-3 resulting in the formation of 3,4-dihydroxyphenylacetate in the degradation pathway of phenolic compounds and works together with the flavin reductase C1-HpaH 5 . In vitro C2-HpaH has also been shown to accept 3-(4-hydroxyphenyl)propionate, 4-hydroxybenzoate and 4-nitrophenol as alternative substrates 39 .
HsaA from Rhodococcus sp. Rha1 (HsaA #2) 4  Artificial pollution of soil with the four aromatic compounds biphenyl, phenanthrene, carbazole, and 3-chlorobenzoate, led to the discovery of AcdA pB6-2 and AcdA pB7-2 in a metagenomal screening. Both enzymes, together with different unrelated oxygenases that seem to be involved in the degradation of recalcitrant aromatic compound have been identified due to their ability to catalyze the production of indigo in vivo (in E. coli) 24 . AcdA pB6-2 and AcdA pB7-2 haven't been further characterized.
A metagenomic screening with forest soil also led to the discovery of pJEC due to its ability to catalyze the production of indigo in vivo (in E. coli). This enzyme has not been characterized in detail, either 15 .
BEC has been shown in vivo to catalyze the production of indigo after blue-pigmented E. coli transformants occurred when a genomic library of Ralstonia eutropha HF39 was constructed 17 . BEC has not been biochemically characterized.
The sugar N-oxygenases KijD3 has been characterized in vitro 40 . Moreover its crystal structure has been solved 32,40 . KijD3 has been shown to produce a hydroxylamino species in the biosynthesis of the nitro-containing sugar D-kijanose 40 . KijD3 has also been proposed to catalyze further oxidation of the hydroxylamino species to D-kijanose, although this reaction has never been observed in vitro 32,40 . The ActVA-ORF5 has been characterized in vivo and in vitro 30,43 . ActVA-ORF5 is involved in the biosynthesis of actinorhodin and has been proposed to convert the dihydroxynaphthalene partial structure of 6-deoxy-dihydrokalafungin (DDHK) to its tetrahydroxynaphthalene form by the sequential hydroxylation of DDHK at C-6 and C-8 30 . The ability of ActVA-ORF5 to catalyze C-6 oxygenation has been shown in vitro by using emodinanthrone as a model substrate 30,43 . Moreover, the C-8 oxygenation activity of ActVA-ORF5 has been shown in vivo by complementation of an engineered Streptomyces coelicolor strain that expressed the minimal gene set for the production of . The structural alignment was calculated by the DALI server 48 . The first column shows the superposition of the different monomers, the middle column displays a close-up view of the substrate-binding pocket, and the last column summarizes the reaction catalyzed by the individual flavin-dependent monooxygenases. The substrate-binding pockets of the different monooxygenases comprise the following residues: A) N91, L98, S121, F123, I237, M240, H371 and M373, B) Q123, S146, I148, R263, H396, Y398, C) I88, H95, S118, Y120, H237, H368, A370, M393, D) Y96, F250, F258, H391, F415, T416, E) M102, S141, I143, Q254, S285, P368, I385.

Supplementary Figure 8. Detailed Cladogram of XiaF and Related Flavoenzymes (Neighbor-Joining
Method) 49 The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) is shown next to the branches. Enzymes that have been reported to catalyze indigo formation are colored in blue; deduced gene products (XiaF orthologs) of other putative xia gene clusters that have not been characterized are colored in grey.

Supplementary Figure 9. Detailed Cladogram of XiaF and Related Flavoenzymes (Minimal Evolution
Method) 50 The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) is shown next to the branches. Enzymes that have been reported to catalyze indigo formation are colored in blue; deduced gene products (XiaF orthologs) of other putative xia gene clusters that have not been characterized are colored in grey.

Supplementary Tables
Supplementary Table 1 [ a] The values in parentheses of resolution range, completeness, Rmerge and I/σ (I) correspond to the last resolution shell. [b] Rmerge(I) = ∑hkl∑j |[I(hkl)j -I(hkl)]|/ ∑hkl Ihkl, where I(hkl)j is the measurement of the intensity of reflection hkl and <I(hkl)> is the average intensity.
[c] R = ∑hkl | |Fobs| -|Fcalc| |/∑hkl |Fobs|, where Rfree is calculated without a sigma cut off for a randomly chosen 5% of reflections, which were not used for structure refinement, and Rwork is calculated for the remaining reflections.
[d] Deviations from ideal bond lengths/angles . [e] Number of residues in favored region / allowed region / outlier region.