Introduction

The enediynes represent one of the most fascinating families of natural products for their unprecedented molecular architecture and extraordinary biological activities, and they have had profound impact on modern chemistry, biology and medicine.1, 2, 3, 4 Since the structure of the neocarzinostatin (NCS) chromophore was first elucidated in 1985,5 the enediyne family of natural products has grown steadily with a total of 15 enediynes structurally characterized to date, of which four were isolated in the cycloaromatized form.4 The enediynes are classified into two subcategories according to the size of the enediyne core structures.1, 2 Members of the 9-membered enediyne core subcategory included NCS, C-1027, kedarcidin, maduropeptin, N1999A2, the sporolides, the cyanosporasides (CYA, CYN) and the fijiolides (Figure 1a). Members of the 10-membered enediyne core subcategory included the calicheamicins (CAL), esperamicins (ESP), dynemicin (DYN), namenamicin, shishijimicin and uncialamycin (Supplementary Figure S1). The enediynes have provided an outstanding opportunity to decipher the genetic and biochemical basis for the biosynthesis of complex natural products,1, 2, 3, 4 to explore ways to make novel analogues by manipulating genes governing their biosynthesis,6, 7, 8, 9, 10 and to discover new enediyne natural products by mining microbial genomes for the trademark enediyne biosynthetic machineries.3, 4, 11, 12

Figure 1
figure 1

Structures of the 9-membered enediyne natural products and their biosynthetic gene clusters. (a) Structures of the five 9-membered enediyne natural products (NCS, C-1027, kedarcidin, maduropeptin, N1999A2) and the four additional natural products (fijiolide, sporolides, CYA, CYN) proposed to be derived from 9-membered enediyne precursors after cycloaromatization. The 9-membered enediyne cores or their aromatized products are highlighted in red. (b) Alignment of the seven known 9-membered enediyne biosynthetic gene clusters highlighting the enediyne PKS gene cassette (that is, E3, E4, E5, E, E10) (shown in red) and the seven additional conserved genes (that is, E2, E7, E8, E9, E11, M, J) (color-coded). C-1027 biosynthetic gene cluster nomenclature is used. SgcJ and its homologues reported in this study are shown in blue and highlighted with blue boxes. Additional SgcJ homologues were also noted in the CYA and CYN clusters (boxed with dotted blue lines), but they were not included in the current study due to their varying length and significantly lower amino acid sequence homology.

The first set of biosynthetic gene clusters for the 9-membered enediyne C-1027 13 and the 10-membered enediyne CAL14 was cloned in 2002. Since then, a total of seven biosynthetic gene clusters for 9-membered enediynes (that is, C-1027,13 NCS,15 MDP,16 kedarcidin,17 sporolides,18 CYA19 and CYN 19) and three biosynthetic gene clusters for 10-membered enediynes (that is, CAL,14 ESP (partial)11 and DYN20) have been reported. Comparative analysis of these gene clusters revealed a set of five genes common to both 9- and 10-membered enediynes (that is, the enediyne polyketide synthase (PKS) cassette consisting of E3/E4/E5/E/E10 (Figure 1b)), characterization of which has unambiguously established (i) the polyketide origin for both 9- and 10-membered enediynes and (ii) a convergent model for enediyne biosynthesis.3, 4, 21, 22 Although significant progress has been made toward elucidating the biosynthesis of the peripheral moieties present in enediynes, little is known about the enediyne core biosynthesis. In vivo and in vitro studies have established that the iterative type I PKS enzyme E initiates both 9- and 10-membered enediyne core biosynthesis via an acyl carrier protein-tethered linear polyene intermediate, which, in the absence of other enediyne PKS-associated enzymes, could be released by the thioesterase E10 to afford a heptaene.21, 22, 23, 24, 25 However, the enzymes and chemistry responsible for converting heptaene, or the nascent acyl carrier protein-tethered linear polyene intermediate, into the 9- and 10-membered enediyne cores remain elusive. Many of the candidate genes, predicted to be associated with enediyne core biosynthesis, are often annotated to encode proteins of unknown function.3, 4 Inactivation of these candidate genes in vivo afforded mutant strains that often failed to accumulate any biosynthetic intermediate, revealing few clues for their function in enediyne core biosynthesis. Lack of functional prediction, together with the unavailability of suitable substrates, essentially forfeits any practical attempt to directly characterize these proteins biochemically in vitro.

Here we report the crystal structures of SgcJ and its homologue NCS-Orf16, together with gene inactivation and site-directed mutagenesis studies, to gain insight into enediyne core biosynthesis. We first closely examined the seven gene clusters that encode 9-membered enediyne biosynthesis and uncovered seven genes (E2, E7, E8, E9, E11, M and J), in addition to the five genes, that is, E3/E4/E5/E/E10, encoding the enediyne PKS cassette, that are absolutely conserved but their function could not be predicted on the basis of bioinformatics analysis alone. We then subjected these targets to high-throughput structural biology analysis. This effort resulted in several structures, including SgcJ from the C-1027 and its homologue NCS-Orf16 from the NCS biosynthetic machineries. We next confirmed that SgcJ is absolutely required for C-1027 biosynthesis, inactivation of which in the C-1027 overproducer Streptomyces globisporus SB102210 completely abolished C-1027 production in the resultant ΔsgcJ mutant strain SB1027. We finally showed that SgcJ and NCS-Orf16 share a common structure with the nuclear transport factor 2 (NTF2)-like superfamily of proteins, featuring a hydrophobic pocket in the α+β barrel structure that could constitute as a putative substrate binding or catalytic active site. Site-directed mutagenesis of the conserved residues lining this site abolished C-1027 production, suggesting that SgcJ and its homologues may play a catalytic role in the 9-membered enediyne core biosynthesis.

Results and discussion

SgcJ and homologues are conserved among the 9-membered enediyne biosynthetic gene clusters but their function could not be predicted

Inspired by the enediyne PKS cassette, consisting of E3/E4/E5/E/E10, that is conserved among the seven 9-membered and three 10-membered enediyne biosynthetic gene clusters characterized to date, we recently completed a virtual survey of all bacterial genomes available in public databases using the enediyne PKS cassette as a probe.3, 4 This effort resulted in the identification of an additional 77 putative enediyne biosynthetic gene clusters, implying that enediynes are more common than currently appreciated on the basis of structurally characterized enediyne natural products.1, 2, 3, 4, 11, 12 We subsequently constructed an enediyne genome neighborhood network, including both the 10 known and 77 putative enediyne gene clusters, to facilitate cluster annotation and predict 9- and 10-membered enediyne core biosynthesis. The enediyne PKS cassette is present in all 87 gene clusters, suggesting that they may be responsible for biosynthesis of a common intermediate for both 9- and 10-membered enediyne cores. Subsets of genes that are unique to either 9- or 10-membered enediyne gene clusters are also identified, as exemplified by the E2, E7, E8, E9, E11, M and J genes from the seven known 9-membered enediyne biosynthetic gene clusters (Figure 1b), and they may play roles in diversifying the common intermediate into the 9- or 10-membered enediyne cores, respectively.3, 4

Among this set of genes is SgcJ (Figure 1b), and its homologues are present in the 34 putative 9-membered enediyne biosynthetic gene clusters (Supplementary Figure S2). SgcJ and homologues are comprised of 140–160 amino acids, with amino acid sequence identities ranging from 30 to 66%. According to the BLASTP search result, SgcJ homologues feature a domain of unknown function (DUF4440) and belong to the NTF2-like superfamily, a large group of related proteins that share a common protein fold. The NTF2-like superfamily proteins are widely found in both prokaryotic and eukaryotic organisms and possess versatile functions.26 Proteins in the NTF2-like superfamily are generally defined into two categories, enzymatically active and non-enzymatically active proteins. The former group includes enzymes with varying activities such as the ketosteroid isomerase,27 scytalone dehydrogenase28 and polyketide cyclase.29, 30 The latter group includes proteins that could play roles as diverse as facilitating protein transport into the nucleus31 or mediating multimerization of calcium/calmodulin-dependent protein kinase II (CaMKII),32 or may function as a receptor.33 The enediyne variants of SgcJ show less than 18% amino acid sequence identity to functionally characterized NTF2-like proteins. Owing to the diverse functions of the NTF2-like superfamily, bioinformatics analysis alone fell short of predicting the function of SgcJ and its homologues in the 9-membered enediyne core biosynthesis.

Gene inactivation reveals that sgcJ is necessary for enediyne biosynthesis

To establish a functional linkage of sgcJ and its homologues with enediyne biosynthesis, we inactivated sgcJ in the C-1027 overproducer S. globisporus SB1022 10 by replacing it with the kanamycin resistance cassette through λ-RED-mediated PCR targeting mutagenesis34 (Supplementary Figure S3a). The genotype of the resulting ΔsgcJ mutant strain SB1027 was confirmed by PCR and Southern analysis (Supplementary Figure S3b). SB1027 was fermented under the established conditions for C-1027 production with S. globisporus SB1022 as a positive control.9, 10, 13, 35 Although C-1027 production by SB1022 was readily confirmed upon both bioassay against Micrococcus luteus and BIA, SB1027 completely abolished the production of C-1027, which was unambiguously verified by HPLC and ESI-MS analysis (Figure 2, panels I and II). The requirement for sgcJ in C-1027 biosynthesis was further supported by the fact that the ΔsgcJ mutation in SB1027 could be complemented by expressing a functional copy of sgcJ in trans, restoring C-1027 production in the complementation strain SB1028 to the level comparable to that of SB1022 (Figure 2, panels I and III). Taken together, these data clearly established that SgcJ plays a necessary role in C-1027 biosynthesis and, by analogy, the essential role SgcJ homologues play in nine-membered enediyne core biosynthesis. However, SB1027 failed to accumulate any biosynthetic intermediate to sufficient levels for isolation and structural characterization, revealing no clues for its exact function. We therefore opted to solve the structures of SgcJ and its homologues in an attempt to elucidate their function in 9-membered enediyne core biosynthesis.

Figure 2
figure 2

C-1027 production by S. globisporus SB1022 and derived recombinant strains. HPLC analysis for C-1027 production from fermentation of (I) SB1022, (II) SB1027, (III) SB1028, (IV) SB1029, (V) SB1030, (VI) SB1031, (VII) SB1032, (VIII) SB1033, (IX) SB1034 and (X) SB1035. Symbols denote C-1027 chromophore (♦) and aromatized C-1027 (). biochemical induction assay (inserts) showed that the production of C-1027 in SB1022, loss of C-1027 production in SB1027 and restored C-1027 production in SB1028. A full color version of this figure is available at The Journal of Antibiotics journal online.

The overall structure of SgcJ and NCS-Orf16 reveals structural similarity to NTF2-like superfamily proteins

The crystals of SgcJ were obtained in the monoclinic space group C2 with unit cell parameters a=72.7, b=86.9 and c=55.3 Å and α=γ=90.0°, and β=121.6°. The asymmetric unit contained two peptide chains, corresponding to a solvent content of 50.9%. The asymmetric unit also contained molecules of citric acid, glycerol, phosphate, pentaethlene glycol and tetraethylene glycol, which were present in the crystallization condition. The final model of SgcJ was refined to a resolution of 1.7 Å with an R factor of 16.9% and an Rfree factor of 19.5%. Ramachandran analysis reveals that 99.6% of the residues were in the favored region with none in disallowed regions. Electron density map was well-defined for residues Ser3-Asp140 and Ala10-Asp140 for the two polypeptide chains in the asymmetric unit. Data collection and refinement statistics are summarized in Table 1.

Table 1 Data collection, phasing and refinement statistics for the SgcJ and NCS-Orf16 structures

The NCS-Orf16 crystals were obtained in the monoclinic space group P21 with unit cell parameters a=98.3, b=52.8 and c=131.8 Å and α=γ=90.0°, and β=90.1°. The asymmetric unit contained 10 peptide chains, corresponding to a solvent content of 45.0%. The final model of NCS-Orf16 was refined to a resolution of 2.72 Å with an R factor of 21.6% and an Rfree factor of 25.6%. Ramachandran analysis reveals that 97.2% of the residues were in the favored region with none in disallowed regions. Electron density map was well-defined for residues Thr19-Arg142 for each peptide chain in an asymmetric unit. Data collection and refinement statistics are summarized in Table 1.

SgcJ and its homologues show high amino acid sequence homology (Figure 3a), with SgcJ and NCS-Orf16 sharing 45% amino acid sequence identity. The crystal structures of SgcJ and NCS-Orf16 feature a common three-dimensional structural fold (Figure 3b). The structure of SgcJ superimposed well with NCS-Orf16 with a root-mean-square deviation (rmsd) of 0.83 Å for the Cα atoms. The overall structures of SgcJ and NCS-Orf16 form a cone-like α+β barrel structure, which are both comprised of a long N-terminal α-helix (α1-α2) passing though the curved six-stranded antiparallel β-sheet (β1-β6), with two additional shorter α-helices (α3 and α4) neighbor upon the α1-α2 helix (Figure 3b). The β-sheet packs against the three α-helices to form a hydrophobic core within the α+β barrel (Figure 3b). The crystal structures of SgcJ and NCS-Orf16 are packed as homodimers in an asymmetric unit, which are generated via non-crystallographic twofold axes (Figure 3c). The dimer interface is formed via a hydrogen-bonding network and salt bridges between the flat-face of the β-sheet from each monomer. Both SgcJ and NCS-Orf16 were indeed found to be homodimers in solution upon size exclusion chromatography (Supplementary Figure S4).

Figure 3
figure 3

Sequence alignment of SgcJ and its homologues and overall structure of SgcJ and NCS-Orf16. (a) Sequence alignment of SgcJ and its homologues from the seven known 9-membered enediyne biosynthetic gene clusters. Aligned residues are colored on the basis of the level of conservation. Yellow background with red character and cyan background with red character show the putative general base and acid, respectively. Red background with white character shows strict identity, red character similarity and blue frame similarity across groups. The alignment was created with MUSCLE54 and rendered with ESPript 3.0.55 (b) Ribbon diagram and structural alignment of the SgcJ and NCS-Orf16 monomers. SgcJ and NCS-Orf16 are shown in light blue and yellow, respectively. (c) Ribbon diagram of the SgcJ dimer. The two chains are colored in blue and light blue, respectively. The NCBI accession numbers for each of the proteins are: SgcJ (ALU98438), NCS-Orf16 (AAM77985), MdpJ (ABY66022), KedJ (AFV52149), SpoJ (WP028564083), CyaJ (AG0972300) and CynJ (AG097162).

Consistent with the BLASTP search result, a search of the PDB databank using the DALI server36 revealed that SgcJ and NCS-Orf16 belong to the NTF2-like superfamily. This versatile superfamily is a classic example of divergent evolution wherein the proteins have similar overall structures but diverge greatly in their functions.26, 37 Several crystal structures for NTF2-like superfamily proteins were reported, of which the functions have been characterized, including the association domain of CaMKII from mouse (PDB entry 1HKX),32 NTF2 from rat (PDB entry 1OUN),38 ketosteroid isomerase (KSI) from Pseudomonas putida (PDB entry 1OPY),27 scytalone dehydatase (mgSD) from Magnaporthe grisea (PDB entry 1STD),28 and polyketide cyclases SnoaL from Streptomyces avidinii (PDB entry 1SJW)29 and Tcm ARO/CYC from Streptomyces glaucescens (PDB entry 2RER).30 Despite low amino acid sequence identities, ranging from 9.8 to 17.5%, SgcJ was found to share similar folds with each of the NTF2-like superfamily proteins listed, with rmsds of 3.0, 2.6, 2.7, 2.7, 3.1 and 3.0 Å for the Cα atoms, respectively (Figure 4a). SgcJ, NTF2, KSI and SnoaL form homodimers, while mgSD and CaMKII form a trimer and tetradecamer, respectively. Most importantly, all these NTF2-like superfamily proteins contain a hydrophobic pocket in the α+β barrel structure (Figure 4a), which forms a cavity that could be adapted to create an enzyme active site or a small molecule/peptide binding site, thereby serving the versatile functions.

Figure 4
figure 4

Structure comparison between SgcJ and selected homologous in the NTF2-like superfamily. (a) Each structure shows a curved antiparallel β-sheet wall with a group of α-helices on one side of the wall to form the cone-like shapes. The putative general base and acid are shown as yellow and green sticks, respectively. The water molecule in the structure of mgSD is depicted by a red dot. Given in parentheses are PDB accession codes for each of the structures. (b) The proposed mechanisms for KSI, mgSD, SnoaL and Tcm ARO/CYC, featuring the conserved general acid-base pairs to catalyze the initial steps of reactions.

Putative substrate binding cavity and catalytic residues of SgcJ and its homologues

KSI (isomerase),27 mgSD (dehydratase),28 SnoaL (cyclase)29 and Tcm ARO/CYC (cyclase)30 are enzymatically active proteins within the NTF2-like superfamily. Although their functions are different, they share a common catalytic mechanism: (i) a general base abstracts a proton from Cα of a carbonyl group to form an enolate intermediate, which is stabilized by a general acid; (ii) the enolate intermediate tautomerizes back to the carbonyl group followed by double bond rearrangement or nucleophilic attack (Figure 4b). In KSI, mgSD, SnoaL and Tcm ARO/CYC, the general acid-base pairs that initiate the reactions are Asp40-Tyr16/Asp103, His85-a water bound by Tyr30 and Tyr50, Asp121-Gln105, and Tyr35-Arg69, respectively (Figure 4b). Interestingly, the crystal structures of SgcJ and NCS-Orf16 reveal conserved Asp111-Tyr72 and Asp115-Tyr76 pairs located at the entrance of the pocket, respectively (Figure 4a). Since these amino acids are known to act as the general acid-base pair in catalysis, it is tempting to speculate that SgcJ may play a similar catalytic role in transforming the nascent linear polyene intermediate along with other enediyne PKS-associated enzymes, into the 9-membered enediyne core.

Additionally, both SgcJ and NCS-Orf16 form a hydrophobic cavity within their pockets. Despite less than 50% amino acid sequence identity between SgcJ and NCS-Orf16, the amino acids lining the cavities are conserved: Trp29, Phe37, Tyr72, Trp118 and Tyr132 in SgcJ versus Trp32, Phe40, Tyr76, Trp122 and Tyr136 in NCS-Orf16 (Figures 3a and 5). Intriguingly, in the crystal structure of SgcJ, a molecule of pentaethylene glycol (1PE in chain A) and one of tetraethylene glycol (PG4 in chain B) were found bound in the cavity and surrounded by the conserved amino acid residues (Figure 5a). These polyethylene glycol molecules may mimic the binding of the linear polyene intermediate, which is sequestered and stabilized by the conserved aromatic residues lining the cavity during biosynthesis of the otherwise unstable 9-membered enediyne core intermediates.

Figure 5
figure 5

Structure comparison between SgcJ and NCS-Orf16. The electrostatics diagrams (left) and the conserved residues around the cavities (right) of (a) SgcJ and (b) NCS-Orf16. The cavities are shown by gray transparency. The ligand tetraethylene glycol (PG4) molecule is colored in green. The 2Fo–Fc electron density map is contoured at 1.0σ.

The putative general acid-base catalytic pair and the amino acids lining the cavity are partially conserved among the SgcJ homologues in the seven known (Figure 3a) and 34 putative (Supplementary Figure S2) 9-membered enediyne biosynthetic gene clusters. To provide additional experimental data to support the catalytic role SgcJ and its homologues may play in enediyne core biosynthesis, we mutated each of the six conserved residues in SgcJ (that is, D111A and Y72A acting as the general acid-base pair, and W29A, F37A, W118A and Y132A lining the cavity) by site-directed mutagenesis. The expression constructs (pBS1148 to pBS1153) for the mutant variants of sgcJ were identical to pBS1146, in which the expression of sgcJ or its mutant variants was under the control of the constitutive ErmE* promoter. Introduction of pBS1148-pBS1153 individually into SB1027 afforded SB1030-SB1035, respectively, which were fermented, with SB1028 as a positive control, to examine if they could complement the ΔsgcJ mutation in SB1027. Gratifyingly, none of six mutants restored C-1027 production (Figure 2, panels III and V-X), consistent with the proposal that these conserved residues are involved in substrate recognition, catalysis or both. Taken together, we now propose that SgcJ plays a catalytic role in transforming the linear polyene intermediate, along with other enediyne PKS-associated enzymes, into an enzyme-sequestered 9-membered enediyne core intermediate.

SgcJ and its homologues are pathway specific for enediyne biosynthesis

We have previously demonstrated that PKSEs and thioesterases from different 9- and 10-membered enediyne machineries are freely interchangeable and 9- versus 10-membered enediyne core biosynthetic divergence occurs beyond the PKSE-thioesterase chemistry.21, 22 Given the sequence homology among SgcJ and its homologues (Figure 3a and Supplementary Figure S2) and the structural similarity as exemplified by SgcJ and NCS-Orf16 (Figures 3b,4a and 5), as well as the common catalytic role proposed for SgcJ and homologues in 9-membered enediyne core biosynthesis, we finally asked if SgcJ and its homologues are pathway specific. An expression vector pBS1147 for ncs-orf16 was similarly constructed as pBS1146 for sgcJ, in which the expression of ncs-orf16 was under control of the constitutive promoter ErmE*. Introduction of pBS1147 into the ΔsgcJ mutant strain SB1027 afforded SB1029, which offered the opportunity to examine if ncs-orf16 could cross-complement the ΔsgcJ mutation in SB1027. SB1029 was fermented with SB1028 as a positive control. Cross-complementation was not observed as evidenced upon HPLC analysis that showed no C-1027 production in SB1029 (Figure 2, panels III and IV). This result would suggest that SgcJ and its homologues are pathway specific for 9-membered enediyne core biosynthesis. Close comparison of the SgcJ and NCS-Orf16 structures indeed showed subtle differences in protein surface electrostatics and the shape of the putative cavities (Figure 5), which may account for unique protein-protein interaction or accommodate varying enediyne core intermediates for different 9-membered enediyne biosynthetic machineries.

Conclusions

The enediynes have served as an outstanding model to study the biosynthesis of complex natural products. Since cloning of the first set of enediyne biosynthetic gene clusters nearly 15 years ago,13, 14 significant progress has been made toward elucidating the biosynthesis of the peripheral moieties present in enediynes, but biosynthesis of the eneidyne cores remains elusive.1, 2, 3, 4 Comparative analysis of the enediyne gene clusters clearly revealed sets of genes that are highly conserved among the 9-membered, 10-membered or both enediynes, serving as outstanding candidates to study enediyne core biosynthesis.3, 4 Many of these candidate genes, however, are often annotated to encode proteins of unknown function, inactivation of which in vivo afforded mutant strains that often failed to accumulate any biosynthetic intermediates, thereby revealing few clues for their function in enediyne core biosynthesis. As a result, in spite of the great progress made in the past decade in characterizing the enediyne PKS enzyme E and its cognate thioesterase, cumulating to the discovery of a linear heptaene and its variants as the earliest possible intermediates or shunt metabolites for enediyne core biosynthesis,21, 22, 23, 24, 25 the exact nature of the nascent linear polyketide intermediates and their subsequent transformation to 9- and 10-membered enediyne cores remain unknown.

Recent technological advance in X-ray crystallography has made it possible to apply high-throughput structural biology as a practical tool to functionally characterize genes with deduced products that show little sequence homology to proteins of known function.39 While the current study fell short of establishing the exact function for SgcJ and its homologues, the structures of SgcJ and NCS-Orf16 and comparison to the NTF2-like superfamily of proteins allowed us to (i) define a putative substrate binding or catalytic active site, (ii) correlate the function of SgcJ to C-1027 biosynthesis by site-directed mutagenesis of the conserved residues lining this site, and (iii) propose that SgcJ and its homologues may play a catalytic role, along with other enediyne PKS-associated enzymes, in transforming the linear polyene intermediate into an enzyme-sequestered 9-membered enediyne core intermediate. These findings will surely help formulate hypotheses and design experiments to ascertain the function of SgcJ and its homologues in 9-membered enediyne core biosynthesis in the future.

Materials and methods

Strains, plasmids and culture conditions

Bacterial strains, plasmids and primers used in this study are summarized in Supplementary Tables S1, S2, and S3, respectively. Escherichia coli strains and M. luteus ATCC 9431 were cultured in lysogeny broth or grown on lysogeny broth agar plates. S. globisporus wild-type and recombinant strains were cultivated at 28 °C on ISP Medium 4 (Becton Dickenson, Franklin Lakes, NJ) for sporulation. Antibiotics for selection were used at the following concentrations: 25 μg ml−1 for apramycin and thiostrepton, and 50 μg ml−1 for chloramphenicol and kanamycin.

Construction of the ΔsgcJ mutant strain S. globisporus SB1027

The ΔsgcJ mutant strain SB1027 was constructed in the C-1027 overproducer S. globisporus SB102210 by gene replacement via homologous recombination. Briefly, the 1.5-kb kanamycin resistance cassette was amplified by PCR from pJTU4659 with primers sgcJtgtF and sgcJtgtR (Supplementary Table S3) and used to replace sgcJ in cosmid pBS1005 35 via λ-RED-mediated PCR targeting mutagenesis34 to generate pBS1143. The ΔsgcJ gene was then excised from pBS1143 as a ~21 kb XbaI-SpeI fragment and inserted into the XbaI site of pSET151 to afford pBS1144. pBS1144 was finally introduced into S. globisporus SB1022 by E. coli-S. globisporus conjugation.40 Exconjugates resulting from the desired double-crossover homologous recombination were selected on the basis of kanamycin-resistant and thiostrepton-sensitive phenotype, and named SB1027, the genotype of which was confirmed by PCR and Southern analysis (Supplementary Figure S3).

Construction of ΔsgcJ complementation strains S. globisporus SB1028 and SB1029

A 0.8-kb fragment bearing oriT was amplified by PCR from plasmid pSET152 with primers oriT152F and oriT152R (Supplementary Table S3), digested with KpnI, and cloned into the same site of pUWL201pw to generate pBS1145. A 420-bp fragment of sgcJ and a 432-bp fragment of ncs-orf16 were amplified by PCR from cosmids pBS1005 35 and pBS5007 15, with primers sgcJ201NdeIF and sgcJ201EcoRIR, and ncs16NdeIF and ncs16HindIIIR, respectively (Supplementary Table S3). The resultant products were digested with NdeI and EcoRI (for sgcJ), and NdeI and HindIII (ncs-orf16), and cloned into the same sites of pBS1145 to afford pBS1146 (for sgcJ) and pBS1147 (for ncs-orf16), respectively. Both pBS1146 and pBS1147, in which the expressions of sgcJ and ncs-orf16 were under the control of the constitutive ErmE* promoter,40 were finally introduced into the ΔsgcJ mutant strain S. globisporus SB1027 by E. coli-S. globisporus conjugation.40 Exconjugates were selected on the basis of thiostrepton-resistant phenotype as the desired complementation strains, and named SB1028 (that is, sgcJ expressing) and SB1029 (that is, ncs-orf16, expressing), respectively.

Site-directed mutagenesis of SgcJ

Plasmids of the sgcJ mutants, pBS1148 (W29A), pBS1149 (F37A), pBS1150 (Y72A), pBS1151 (D111A), pBS1152 (W118A) and pBS1153 (Y132A), were constructed by the QuikChange site-directed mutagenesis method, following the manufacturer’s protocol (Agilent Technologies, Santa Clara, CA) and using pBS1146 as a template. The primers used are listed in Supplementary Table S3. The mutations were verified by DNA sequencing. Each of the mutant constructs was then introduced into the ΔsgcJ mutant strain SB1027 by conjugation, yielding the complementation strains SB1030 (that is, SB1027/pBS1148), SB1031 (that is, SB1027/pBS1149), SB1032 (that is, SB1027/pBS1150), SB1033 (that is, SB1027/pBS1151), SB1034 (that is, SB1027/pBS1152) and SB1035 (that is, SB1027/pBS1153), respectively.

Production, isolation and analysis of C-1027

S. globisporus recombinant strains were cultured following a two-step fermentation procedure reported previously, and both stages utilized the same medium (1% glycerol, 2% dextrin, 1% fish meal, 0.5% peptone, 0.2% (NH4)2SO4, 0.1% MgSO4, 0.2% CaCO3, pH 7.0).9, 10, 13, 35 Briefly, fresh spores of the recombinant strains were inoculated into 250-ml baffled flasks containing 50 ml of medium and incubated at 28 °C and 250 rpm for 48 h. The resultant seed cultures (2.5 ml) were then inoculated into 250-ml baffled flasks containing 50 ml of the same medium, and fermentation continued at 28 °C and 250 rpm for 7 days. The C-1027 overproducer SB1022 and the ΔsgcJ mutant strain SB1027 were cultured in medium without any antibiotics. All other recombinant strains used in this study were cultured in medium supplemented with 5 μg ml−1 thiostrepton to retain the introduced plasmids.

Isolation and HPLC analysis of the C-1027 chromophore were carried out by following published procedures.9, 10, 13, 35 Briefly, fermentation broth (50 ml) was adjusted to pH 4.0 with 0.1 N HCl and centrifuged to remove any precipitate. To the supernatant, (NH4)2SO4 was then added to 50% saturation, and the precipitated C-1027 chromoprotein was collected by centrifugation and dissolved in 2 ml of 0.1 M potassium phosphate, pH 8.0. The latter was extracted with 2 ml of EtOAc twice, and the combined EtOAc extract was concentrated in vacuo and re-dissolved in CH3OH. HPLC was carried out on a Beckman ultrasphere-ODS dp analytical column (5 μm, 150 × 4.6 mm) (Beckman Coulter, Indianapolis, IN), eluted isocratically with 20 mM potassium phosphate (pH 6.8)/CH3CN (50:50 v/v) at a flow rate of 1.0 ml min−1 and UV detection at 350 nm on a Varian HPLC system with a Prostar 330 PDA detector (Agilent Technologies). LC-MS analysis of C-1027 was performed on an Agilent 6230 TOF LC-MS instrument (Agilent Technologies).

Determination of C-1027 production by bioassay and biochemical induction assays

Determination of C-1027 production by bioassay against M. luteus ATCC 9431 was carried out as described previously.9, 10, 13, 35 Alternatively, C-1027 production was also followed by the biochemical induction assay according to literature procedures,11 which uses the E. coli BR513 strain as an indicator and specifically detects agents with DNA damage activities. Briefly, 10 μl of fermentation supernatant or an agar plug were applied onto agar plates seeded with E. coli BR513 and incubated for 3–4 h at 37 °C. The plates were then overlaid with soft agar containing 0.7 mg ml−1 of X-gal and incubated at 37 °C for additional 30–60 min to develop the characteristic blue color, indicative of DNA damage, and thus C-1027 production.

Gene expression and protein purification

PCR amplification of sgcJ from S. globisporus genomic DNA13 and ncs-orf16 from S. carzinostaticus genomic DNA15 by KOD Hot Start DNA polymerase (EMD Millipore, Billerica, MA) followed the manufacturer’s protocols using primers sgcJ-F and sgcJ-R and primers orf16-F and orf16-R primers, respectively (Supplementary Table S3). The amplification buffer was supplemented with betaine to a final concentration of 2.5 M. The PCR products were purified and cloned into pMCSG57, yielding pBS1154 (expressing sgcJ) and pBS1155 (expressing ncs-orf16), by the ligation-independent procedures.41 The expression plasmids were then transformed into E. coli BL21(DE3)-Gold strain (Stratagene, San Diego, CA) for protein production. Production and purification of SeMet-labeled SgcJ and NCS-Orf16 were performed according to standard protocol.42 Briefly, the cell were cultured at 37 °C in 1 L of enriched M9 medium42 until OD600=1.0. After air-cooling the culture down at 4 °C for 60 min, inhibitory amino acids (25 mg each per liter L-valine, L-isoleucine, L-leucine, L-lysine, L-threonine and L-phenylalanine), selenomethionine (SeMet) and isopropyl-β-D-thiogalactoside (IPTG) were added. The cells were incubated overnight at 18 °C, harvested and re-suspended in lysis buffer (500 mM NaCl, 5% (v/v) glycerol, 50 mM HEPES pH 8.0, 20 mM imidazole and 10 mM β-mercaptoethanol). The SeMet-labeled proteins were purified using Ni-NTA affinity chromatography by the AKTAxpress system (GE Healthcare Life Sciences, Marlborough, MA) and digested with recombinant His6-tagged Tobacco etch virus (TEV) protease to remove the His6-tag. The final pure proteins were concentrated using Amicon Ultra-15 concentrators (Millipore, Bedford, MA) in 20 mM HEPES pH 8.0 buffer, 250 mM NaCl and 2 mM dithiothreitol. Protein concentrations were determined based on the absorbance at 280 nm using a molar absorption coefficient (ɛ280=19,480 and 20,970 M−1 cm−1 for SgcJ and NCS-Orf16, respectively).43 The concentrations of SgcJ and NCS-Orf16 used for crystallization were both ~50 mg ml−1. Size-exclusion chromatography was performed using a Superdex 200 16/600 column (GE Healthcare Life Sciences) with an Äkta FPLC chromatographic system (GE Healthcare Life Sciences) at 4 °C. The column was calibrated with a size-exclusion calibration kit (GE Healthcare Life Sciences) and developed with the elution buffer (200 mM NaCl, 100 mM Tris, pH 8.0) at flow rate of 0.5 ml min−1 with UV detection at 280 nm.

Protein crystallization

Both SgcJ and NCS-Orf16 were screened for crystallization conditions using a Mosquito liquid dispenser (TTP Labtech, Melbourn, UK) and the sitting-drop vapor-diffusion technique in 96-well CrystalQuick plates (Greiner Bio-one, Monroe, NC). For each condition, 0.4 μl of protein (52.8 mg ml−1) and 0.4 μl of crystallization formulation were mixed. The mixture was equilibrated against 140 μl of the reservoir in the well. Commercially available crystallization screens were used, including MCSG-1–4 (Microlytic Inc., Burlington, MA) at 24 °C, 16 °C and 4 °C. For SgcJ, crystals were obtained under several conditions, with the most promising condition being from 0.1 M Na2HPO4 (adjust to pH 4.2 with citric acid) and 40% (v/v) PEG 300 at 16 °C. The crystals grew within 1 week and reached sizes of approximately 0.100 mm × 0.020 mm × 0.010 mm. For NCS-Orf16, suitable crystals for X-ray diffraction were grown from the condition containing 0.2 M sodium formate and 20% (w/v) PEG 3350 at 16 °C.

Data collection, structure determination and refinement

Diffraction data were collected at 100 K at the 19-ID beamline of the Structural Biology Center at the Advanced Photon Source, Argonne National Laboratory.44 A single data set was taken near the Se K-edge peak anomalous position (0.9792 Å) from a single protein crystal of SgcJ to a resolution of 1.70 Å. The crystal was exposed for 3 s per 1.0° rotation with a distance of 240 mm from crystal to detector. The data were recorded on an ADSC Quantum 315r CCD detector. For NCS-Orf16, data collection was the same except the crystal to detector distance was 327 mm, and three data sets were collected and merged. Data collection strategy, integration and scaling were performed with the HKL3000 program package.45 A summary of the crystallographic data can be found in Table 1.

The crystal structures of SgcJ and NCS-Orf16 were determined by SAD phasing, utilizing the anomalous signal from Se atoms with shelxc/d/e,46 mlphare,47 and dm48 in HKL300045 for SgcJ and SOLVE/RESOLVE49 for NCS-Orf16, and refined to 1.7 and 2.72 Å, respectively. For SgcJ, the initial model contains two protein chains consisting of at least 90% of the residues in each chain. For NCS-Orf16, the initial model contains 10 protein chains consisting of 67% of whole model with 20% assigned side-chain. Extensive manual model building with COOT50 and the subsequent refinement using phenix.refine51 were performed until R-factors converged to final values of R(Rfree)=0.168(0.195) and 0.217(0.256) for the structures of SgcJ and NCS-Orf16, respectively. The geometrical properties of the models were assessed using PROCHECK52 and Molprobity.53 The atomic coordinates and structure factors have been deposited in the Protein Data Bank with the accession code 4I4K for SgcJ and 4OVM for NCS-Orf16, respectively.