Introduction

Streptomyces, a representative genus of Gram-positive Actinobacteria, often contain 20–40 biosynthetic gene clusters (BGCs) in their genomes, and each BGC, upon proper expression, encodes specific secondary metabolites with a wide range of bioactivities including antibacterial, antifungal, antitumor, and immunosuppressive agents1,2,3,4,5. Therefore, Streptomyces species have been considered an important source of useful natural products, and have been the target of many metabolic engineering efforts6,7,8,9,10. However, some challenges exist in rational engineering of Streptomyces due to their GC-rich genomes and complex regulatory systems11,12. For these reasons, random mutagenesis is still conducted for the strain development of Streptomyces in addition to rational engineering13,14,15. Random mutagenesis approaches commonly use ultraviolet (UV) or chemical-based methods, but atmospheric and room-temperature plasma has also recently been utilized15,16,17,18. However, random mutagenesis always requires comprehensive subsequent genomic analysis to identify gene mutations that lead to the increased production performance of a production host.

In this study, we performed a comparative genomic analysis of the natural rapamycin producer Streptomyces rapamycinicus (formerly, Streptomyces hygroscopicus) NRRL 5491 (also ATCC 29253), and its mutant strain SRMK07 that overproduces rapamycin. Rapamycin is a hybrid of non-ribosomal peptide and polyketide with various useful bioactivities, including antifungal, antitumor, and immunosuppressive activities19,20,21,22,23. The biosynthesis of rapamycin requires 4,5-dihydroxycyclohex-1-enecarboxylic acid (DHCHC) as a starter unit, and also additional precursors, including malonyl-CoA, methylmalonyl-CoA, pipecolate, and NADPH. The SRMK07 strain was previously generated in our group via UV-based random mutagenesis24 (“Methods” section). Here, genomes of the wild-type and the SRMK07 strain were sequenced and compared in order to identify genomic changes in the SRMK07 strain that might be responsible for the enhanced rapamycin production. On the basis of the whole genome sequences, genome-scale metabolic models (GEMs) for these two strains were also reconstructed for metabolic analysis. This comparative genomic analysis revealed noteworthy differences that have likely contributed to the enhanced rapamycin production of the SRMK07 strain.

Results

Rapamycin production and growth of the SRMK07 strain

The rapamycin-overproducing mutant, S. rapamycinicus SRMK07, produced approximately 207 mg/L rapamycin, or around fourfold more rapamycin than the NRRL 5491 strain (Fig. 1a). Despite the significant improvement in the rapamycin production, the SRMK07 strain showed almost normal growth in comparison to the wild-type (Fig. 1b). The wild-type and the SRMK07 strain reached the peak of biomass accumulation on the fourth and sixth day, according to packed mycelium volume (PMV), respectively. Additionally, both strains were also grown on ISP2 and M1 plates to examine the morphology of their colonies and sporulation patterns, respectively (Fig. 1c and “Methods” section). Both strains grew well on ISP2 plates, but the wild-type colonies were greater in size; both strains did not sporulate on ISP2 plates. Meanwhile, on M1 plates, the wild-type formed spores, while the SRMK07 strain did not show any indication of sporulation.

Figure 1
figure 1

Rapamycin production performance and growth of S. rapamycinicus NRRL 5491 (wild-type) and its rapamycin-overproducing mutant SRMK07. (a,b) Rapamycin production performance (a) and growth (b) of the two strains. The presented data represent the mean from triplicate experiments, and the error bars indicate standard deviations. (c) Growth phenotypes of the two strains grown on ISP2 and M1 plates. The wild-type appeared to sporulate on the M1 plate, while the SRMK07 strain did not show any indication of sporulation. Images were taken on the seventh day of cultivation.

Whole genome sequencing of S. rapamycinicus NRRL 5491 and the SRMK07 strain

We next conducted whole genome sequencing (WGS) of the NRRL 5491 and SRMK07 strains, using both PacBio and Illumina platforms, to identify genomic changes in the SRMK07 strain that have led to its high production performance of rapamycin. The resulting whole genome sequences of the wild-type and the SRMK07 appeared to be 12.47 Mbp and 9.56 Mbp, respectively (Fig. 2). In contrast to the SRMK07 strain’s genome that was initially obtained as a single contig, the wild-type’s genome data were obtained as seven contigs. To resolve this problem, two independent sequences of S. rapamycinicus NRRL 5491 genome, currently available in the NCBI database (GCA_003675955.125 and GCA_000418455.126), were used as references to connect the seven contigs of our wild-type’s genome. Among these two sequences, only GCA_000418455.1 is represented as a single contig although it contains multiple sequencing gaps. In contrast, the second sequence (i.e., GCA_003675955.1) contains four contigs, but fortunately, lacks sequencing gaps. Therefore, we utilized GCA_000418455.1 as an initial framework to determine the correct order of our own NRRL 5491 seven contigs, while GCA_003675955.1 served as a template to fill any sequencing gaps. The assembled wild-type genome sequence showed a size (12.47 Mbp) comparable to that of GCA_000418455.1 (12.70 Mbp).

Figure 2
figure 2

Profiles of Illumina reads from S. rapamycinicus NRRL 5491 (wild-type) and the SRMK07 strain, mapped on the wild-type’s genome assembled in this study. (a) Profile of Illumina reads from the wild-type. (b) Profile of Illumina reads from the SRMK07 strain. The data were visualized using SignalMap (Roche NimbleGen, Inc., Pleasanton, CA). ‘A’ and ‘A’’ indicate the potentially deleted regions, and ‘B’ indicates the potentially duplicated region in the SRMK07 strain’s genome. Information with dashed lines correspond to the location of target genes for the relative quantification analysis using qPCR (Fig. 3) as well as the deleted core genes (Table 2) in the SRMK07 strain’s genome.

Comparative genomic analysis based on Illumina reads mapping

Comparison of the genomes of the wild-type and the SRMK07 strain showed a difference of about 2.91 Mbp; 10,140 protein-coding genes were predicted from the wild-type genome, while only 7757 protein-coding genes were predicted from the SRMK07 genome (Fig. 2), which indicates large genomic deletions in the SRMK07 genome as a result of the random mutagenesis. To further analyze the genomic differences between the wild-type and the SRMK07 strain, Illumina sequencing reads of the SRMK07 strain were mapped on the assembled wild-type genome (Fig. 2). As a result, large deletions were observed at both end regions of the SRMK07 genome, which corresponded to approximately 2.91 Mbp (the regions ‘A’ and ‘A’’ in Fig. 2b); this deletion size was almost consistent with the difference in the genome sizes of the wild-type and the SRMK07 strain.

These genomic deletions in the SRMK07 strain were further confirmed by PCR experiments (Supplementary Fig. S1, Supplementary Table S1) and comparative genomic analysis (Supplementary Data 1). For the PCR experiments, primers were designed to target genes located in either end region of the wild-type’s genome, which were expected to be absent in the SRMK07 strain’s genome. Indeed, as a result of the PCR, target bands with expected size were obtained only from the wild-type’s genomic DNA (gDNA), and not from the SRMK07 strain’s genome (Supplementary Fig. S1). However, further thorough analysis will be necessary to confirm these potential genomic deletions in the SRMK07 strain because Streptomyces species have a linear chromosome with both ends having terminal inverted repeats (TIRs), and these TIRs make firm mapping of the borders of the deletions difficult. Full information on conflict positions in nucleotide sequences as well as missing genes in the SRMK07 strain in comparison with the wild-type is available in Supplementary Data 1.

We subsequently examined whether secondary metabolite BGCs in the SRMK07 strain were affected by the genomic deletions by running antiSMASH 5.0 for genomes of the wild-type and the SRMK07 strain27. As a result, the wild-type was predicted to have 52 BGCs, whereas 17 BGCs appeared to be lost in the SRMK07 strain (Table 1). Since nine of these missing BGCs encode polyketides or hybrids of non-ribosomal peptides and polyketides, the loss of these BGCs in the SRMK07 strain might have enhanced the production of rapamycin by redirecting precursors necessary for the rapamycin biosynthesis.

Table 1 Biosynthetic gene clusters (BGCs) that appeared to be absent in the SRMK07 strain’s genome.

Interestingly, a genomic region (9,783,802–10,695,700 bp in the wild-type genome) was observed in the SRMK07 strain where the number of the mapped reads was notably greater than other regions of the genome (1,174,259–9,783,802 bp in the wild-type genome) by approximately twofold (the region ‘B’ in Fig. 2b); this genomic region strongly indicates the duplication of genes. Since this genomic region in the SRMK07 strain covers the rapamycin BGC (8,583,793–8,830,075 bp in the SRMK07 genome, which corresponds to 9,758,976–10,004,25 bp in the wild-type genome), duplication of genes in this region might have also contributed to the enhanced rapamycin production. To further verify the duplication of this genomic region, we performed real-time PCR (qPCR) for relative quantification of genes from the potentially duplicated region in comparison with genes from other regions that are known to exist as a single copy across various Streptomyces species (Fig. 3, Supplementary Table S2); information on single-copy genes was obtained from OrthoDB28. For this analysis, five reference single-copy genes were selected, which encode: NADH-quinone oxidoreductase subunit H; RtcB family protein; RNA helicase; aspartate kinase; and type I DNA topoisomerase. Likewise, eight genes were selected from the potentially duplicated region in the SRMK07 strain’s genome, which encode: 3-ketoacyl-CoA thiolase; regulatory protein AfsR; l-lysine cyclodeaminase; ferredoxin; glycerol uptake operon antiterminator regulatory protein; a hypothetical protein encoded by a rapamycin biosynthetic gene; SDR family oxidoreductase; and putative ABC transporter ATP-binding protein YbiT. l-lysine cyclodeaminase (encoded by rapL), ferredoxin (encoded by rapO gene), and the hypothetical protein all belong to the rapamycin BGC. The qPCR results with the gene encoding NADH-quinone oxidoreductase subunit H as a control showed that the relative quantities of the reference single-copy genes in the SRMK07 genome ranged from 0.7 to 1.3, while the relative quantities of the genes from the potentially duplicated region were close to 2 (Fig. 3). These results strongly suggest the duplication of the genomic region in the SRMK07 strain where the rapamycin BGC is located.

Figure 3
figure 3

Relative quantification of genes from the potentially duplicated region in the SRMK07 strain’s genome. For this, five reference single-copy genes (‘R1’ to ‘R5’ defined below) and eight genes (‘T1’ to ‘T8’ defined below) from the potentially duplicated region in the SRMK07 strain’s genome were first selected for the qPCR experiments. A gene ‘R1’ encoding NADH-quinone oxidoreductase subunit H (5,392,7205,393,685 bp in the NRRL 5491 genome) was used as a control to measure the relative quantity of the other four single-copy genes (blue bars), and the eight genes from the potentially duplicated region (red bars). Genes ‘R2’, ‘R3’, ‘R4’, and ‘R5’ are those known to exist as a single copy across Streptomyces species, and encode the following proteins, respectively (along with the chromosome location in the NRRL 5491 genome): RtcB family protein (5,488,049–5,488,159 bp); RNA helicase (5,756,656–5,760,582 bp); aspartate kinase (6,341,319–6,342,599 bp); and type I DNA topoisomerase (6,399,363–6,402,227 bp). Genes ‘T1’, ‘T2’, ‘T3’, ‘T4’, ‘T5’, ‘T6’, ‘T7’, and ‘T8’ from the potentially duplicated region encode the following proteins, respectively: 3-ketoacyl-CoA thiolase (9,778,772–9,809,242 bp); regulatory protein AfsR (9,875,692–9,877,524 bp); l-lysine cyclodeaminase (9,900,735–9,901,766 bp); ferredoxin (9,904,048–9,904,284 bp); glycerol uptake operon antiterminator regulatory protein (10,001,392–10,002,039 bp); hypothetical protein (10,003,629–10,004,255 bp); SDR family oxidoreductase (10,035,843–10,036,631 bp); and putative ABC transporter ATP-binding protein YbiT (10,693,310–10,694,929 bp). ‘T1’ and ‘T8’ represent the start and end regions of the potentially duplicated region. ‘T3’, ‘T4’ and ‘T6’ belong to rapamycin BGC. The primers used for these qPCR experiments are available in Supplementary Table S2. The presented data represent the mean from triplicate experiments, and the error bars indicate standard deviations.

Core gene analysis

The SRMK07 strain showed the normal growth despite the large genomic deletions. This observation raised a question on the presence of core genes in this strain that are necessary for the normal growth; the core genes here refer to those present in genomes of the vast majority of biologically related organisms, for example, Streptomyces species in this study, likely because of the biological importance29,30. To examine the distribution of core genes in the SRMK07 strain’s genome, a software program ‘Antibiotic Resistant Target Seeker’ (ARTS) was used, which allows the detection of core genes, including housekeeping genes and resistance genes associated with BGCs31. As a result, ARTS predicted 393 and 389 core genes (out of 10,140 and 7757 protein-coding genes, respectively) from genomes of the wild-type and the SRMK07 strain, respectively (Table 2, Supplementary Data 2). Hence, only four core genes were predicted to be missing in the SRMK07 strain. These four genes include TIGR01235, TIGR03442, TIGR03438, and TIGR00173 (all TIGRFAM identifiers32), which encode: pyruvate carboxylase; ergothioneine biosynthesis protein EgtC (or γ-glutamyl-hercynylcysteine sulfoxide encoded); ergothioneine-biosynthetic methyltransferase EgtD (or histidine N-alpha-methyltransferase); and 2-succinyl-5-enolpyruvyl-6-hydroxy-3-cyclohexene-1-carboxylate synthase MenD, respectively. It should be noted that evidence for the possible presence of paralogs of these four genes was not found in the SRMK07 genome according to ARTS and OrthoDB.

Table 2 Deleted core genes in the SRMK07 strain’s genome.

A close examination of metabolic genes in the SRMK07 strain suggested that the loss of these four core genes should not affect the overall metabolic activities of the SRMK07 strain. Pyruvate carboxylase (TIGR01235) is an anaplerotic enzyme, and is involved in regulating a phosphoenolpyruvate (PEP)-pyruvate-oxaloacetate pool that is critical for the optimal distribution of metabolic fluxes in central carbon metabolism33. Despite the loss of this gene, other genes involved in regulating the PEP-pyruvate-oxaloacetate pool were still present in the SRMK07, including PEP carboxykinase, PEP carboxylase, malic enzyme, and malate dehydrogenase. Next, EgtC (TIGR03442) and EgtD (TIGR03438) are involved in the biosynthesis of ergothioneine that detoxifies reactive oxygen species and reactive nitrogen species for redox homeostasis, and the absence of ergothioneine results in higher oxidative stress34,35. Biological roles of ergothioneine in Gram-positive bacteria can be complemented by other detoxifying molecules, such as mycothiol, and glutathione34,35. Our genomic analysis of the wild-type and the SRMK07 showed that both strains carry intact genes for the biosynthesis of mycothiol36 (i.e., mshA, mshB, mshC, and mshD) and glutathione. Finally, menD (TIGR00173) is part of the genes, menABCDEFGH, that encode the biosynthesis of menaquinone; menaquinone plays an important role in the electron transport in Gram-positive bacteria37. Fortunately, an alternative biosynthetic pathway for menaquinone, known as futalosine pathway, has also been reported in Streptomyces coelicolor38,39,40. Homologs of the genes in this futalosine pathway were found to be present in both the wild-type and the SRMK07 strain (Supplementary Table S3).

Taken together, the additional genes that were found intact in the SRMK07 strain have the possibility to complement the function of the absent four core genes (i.e., TIGR01235, TIGR03442, TIGR03438, and TIGR00173), and would allow the SRMK07 strain’s normal growth despite the large genomic deletions. Moreover, there were core genes that exist in multiple copies in the wild-type, but at least a single copy was found for all these core genes in the SRMK07 strain (Table 2).

Comparative metabolic network analysis

Enhanced production of a secondary metabolite might also be linked with changes in a metabolic network, providing precursors and energy molecules necessary for rapamycin biosynthesis. To examine this question, we reconstructed GEMs, SrapWT2040 and SrapUV2010, that describe the metabolism of the NRRL 5491 and SRMK07 strains, respectively (Supplementary Data 3, 4). In contrast to 2383 protein-coding genes that appeared to be deleted in the SMRK07 strain as a result of the random mutagenesis, the constructed SrapUV2010 appeared to have only 30 fewer biochemical reactions than SrapWT2040; these 30 biochemical reactions are associated with a total of 372 metabolic genes (Fig. 4a, Table 3). As expected, biochemical reactions associated with a pathway ‘Biosynthesis of secondary metabolites’ were shown to be most affected in SrapUV2010; six corresponding biochemical reactions, out of the 30 reactions, were missing in SrapUV2010. Additional metabolic pathways that were affected by the random mutagenesis include: central carbon metabolism (i.e., fructose, mannose, glyoxylate, and TCA cycle), amino acid metabolism (i.e., phenylalanine, arginine, proline, and glycine), and degradation pathways (i.e., benzoate, styrene, and polycyclic aromatic hydrocarbon).

Figure 4
figure 4

Statistics and simulation results of the genome-scale metabolic models (GEMs) of S. rapamycinicus NRRL 5491 (wild-type) and its mutant SRMK07. (a) Number of genes, reactions, and metabolites of the GEMs, SrapWT2040 and SrapUV2010, that represent the wild-type and its mutant SRMK07, respectively. (b) Number of essential genes and essential reactions predicted using SrapWT2040 and SrapUV2010. SrapUV2010 was predicted to have eight additional essential genes, and one additional essential reaction in comparison with SrapWT2040. (c) The growth prediction results using SrapWT2040 in comparison with the reported growth data that involved 17 individual carbon sources (Supplementary Table S4) and 19 individual nitrogen sources (Supplementary Table S5). It should be noted that SrapUV2010 also generated the same prediction accuracy as SrapWT2040.

Table 3 List of 30 biochemical reactions available in SrapWT2040 (wild-type), but absent in SrapUV2010 (SRMK07 strain).

To gain insights into the metabolic effects of losing these 30 reactions, we first conducted gene/reaction essentiality analysis for the two GEMs, SrapWT2040 and SrapUV2010 (Fig. 4b). We found that none of these 30 reactions were essential for the growth of the wild-type and the SMRK07 strain. Overall, SrapUV2010 showed a slightly greater number of essential genes and essential reactions than SrapWT2040: 189 versus 181 essential genes, and 513 versus 512 essential reactions (Fig. 4b). A possible reason for this observation is likely attributed to less metabolic robustness of SrapUV2010 as a result of losing the 30 reactions. For example, NAD kinase is encoded by two paralogous nadK genes in the wild-type, but, only one nadK gene remains in the SRMK07 strain. Therefore, this single nadK gene becomes essential in SrapUV2010 upon gene deletion in silico. Also, one additional essential reaction in SrapUV2010 corresponds to the reaction LIPOCT catalyzed by lipoyl(octanoyl) transferase. Likewise, LIPOCT is a non-essential reaction in the wild-type, but became essential as a result of deleting the reaction OCTNLL that is catalyzed by octanoate non-lipoylated apo domain ligase in ‘lipoate metabolism’. Both OCTNLL and LIPOCT contribute to the biosynthesis of lipoate, which is an essential cofactor for 2-oxoacid dehydrogenases and glycine cleavage system in central carbon metabolism41.

Next, parsimonious flux balance analysis (pFBA) was implemented for SrapWT2040 and SrapUV2010 to gain insights into their intracellular flux distributions when producing rapamycin. The pFBA simulation revealed two reactions with greater flux values in SrapUV2010, ME2 (catalyzed by NADP-dependent malic enzyme) and G3PD2 (NADP-dependent glycerol-3-phosphate dehydrogenase) (Fig. 5). ME2 and G3PD2 both produce NADPH, which is a required cofactor for the biosynthesis of various secondary metabolites42,43; greater activities of these two corresponding enzymes could be another factor for the SRMK07 strain’s enhanced rapamycin production performance. According to a previous study, overexpressing sco5261 for the ME2 reaction increased the production of a secondary metabolite actinorhodin in S. coelicolor44. Genes for these two reactions have never been targeted for the enhanced production of rapamycin, and thus, can be considered as overexpression targets for metabolic engineering of S. rapamycinicus in the future.

Figure 5
figure 5

Prediction of intracellular metabolic flux distributions predicted using the genome-scale metabolic models (GEMs) of S. rapamycinicus NRRL 5491 (wild-type) and its mutant SRMK07. Intracellular metabolic flux distributions of the wild-type and the SRMK07 strain were predicted using parsimonious flux balance analysis (pFBA) of the GEMs. Metabolic fluxes predicted to be increased in the SRMK07 strain, compared to the wild-type, are marked with a red arrow. Direct precursors for rapamycin biosynthesis are shown in blue boxes. Dotted arrows indicate multiple reactions. Metabolite abbreviations are: DCDC 4,5-dihydroxycyclohexa-1,5-dienecarboxylic acid, DHCHC 4,5-dihydroxycyclohex-1-enecarboxylic acid.

Discussion

In this study, we conducted a comparative genomic analysis for S. rapamycinicus NRRL 5491 and its mutant strain SRMK07 that overproduces rapamycin. For this, both strains were subjected to WGS, which subsequently allowed the identification of large deletions at both end regions of the SRMK07 genome as well as the potentially duplicated region that covers the rapamycin BGC. The duplication of the rapamycin BGC as well as the deletion of the extremities of the chromosome that includes many BGCs are likely to have positive effects on the rapamycin biosynthesis. Obviously, duplicated rapamycin BGC would increase the dosage of rapamycin biosynthetic and regulatory genes, contributing to the enhanced biosynthesis of this molecule. Also, in the absence of multiple BGCs, precursors and energy used for the biosynthesis of the corresponding molecules may be redirected toward the rapamycin biosynthesis, further improving the rapamycin production. Core gene analysis was additionally conducted using ARTS to explain the SRMK07 strain’s normal growth despite the large genomic deletions. Finally, GEMs of the wild-type and the SRMK07 strain were reconstructed to examine these two strains’ metabolic differences.

This study suggests future research opportunities in metabolic engineering for the enhanced production of rapamycin and other secondary metabolites. Previous studies have shown the benefits of genome reduction, which reduces biological complexity, increases genome stability, and improves the production of secondary metabolites45. Relevant examples include Streptomyces avermitilis SUKA17 producing streptomycin and cephamycin46, S. coelicolor M1152 and M1154, both strains producing actinorhodin and chloramphenicol47, and Streptomyces sp. FR-008 LQ3 as a chassis for heterologous expression of biosynthetic genes for secondary metabolites48. Therefore, the SRMK07 strain can also be considered as a promising platform to construct a novel superhost for further enhanced production of rapamycin and other secondary metabolites49,50. Also, in addition to the two reactions ME2 (catalyzed by NADP-dependent malic enzyme) and G3PD2 (NADP-dependent glycerol-3-phosphate dehydrogenase) that can be considered as overexpression targets, further gene manipulation targets can be systematically predicted via comprehensive simulation studies using the GEMs reconstructed. Finally, additional omics analyses, for example transcriptome and metabolome, in combination with genomic and metabolic network analyses would provide more comprehensive phenotypic profiles of the wild-type and the SRMK07 strain.

Conclusions

Comparative genomic analysis conducted in this study generated biological clues that could explain the enhanced rapamycin production performance of the S. rapamycinicus SRMK07 strain that was previously generated via random mutagenesis. The genomic and computational approaches undertaken in this study suggest gene manipulation targets to further enhance the production of rapamycin that can be experimentally tested through metabolic engineering. The approaches in this study can also be considered for analyzing other mutant strains generated from random mutagenesis.

Methods

Strains

S. rapamycinicus NRRL 5491, the wild-type, and its rapamycin overproducing mutant SRMK07 were used in this study. The SRMK07 strain was previously generated via UV-based random mutagenesis24. The wild-type spores were resuspended in saline buffer (0.85% NaCl and 0.1% Tween 90) to dilute the concentration to about 108 /mL. Next, 100 µL of the diluted spores were spread evenly on a M1 plate (2.5 g/L corn steep powder, 3 g/L yeast extract, 3 g/L CaCO3, 0.3 g/L FeSO4, 10 g/L wheat starch, and 20 g/L agar), and exposed to UV for 60 s. UV conditions were set to be 254 nm wavelength, 40 W, and 25–30 cm distance from the spore suspension to achieve a 99% killing rate. The UV-treated spores were incubated at 28 °C for 7–10 days using agar plates containing 2 g/L rapamycin, which allowed the screening of rapamycin-resistant strains. The SRMK07 strain used in this study was obtained by measuring rapamycin from the resistant strains through liquid culture in a 250 mL flask.

Cultivation conditions

Seed cultures of the two strains were incubated in GYM medium for 3 days, and transferred to a main cultivation medium. Flask cultivations were carried out for 14 days at 28 °C and 250 rpm (Fig. 1a,b). GYM medium contains: 4 g/L glucose, 4 g/L yeast extract, and 10 g/L malt extract. The main medium used in flask cultivations was adopted from Yun et al.7 with a slight modification. The main medium contains: 10 g/L M100, 50 g/L glycerol, 10 g/L cottonseed meal, 10 g/L soybean meal, 6.5 g/L yeast extract, 5 g/L (NH4)2SO4, 20 g/L L-lysine, 4 g/L L-tyrosine, 0.7 g/L KH2PO4, 1.14 g/L K2HPO4, 5 g/L NaCl, 0.05 g/L FeSO4·7H2O, and 42.6 g/L MES.

The two strains were also cultured on solid media in order to find differences in their growth phenotypes with focus on morphology and sporulation. ISP2 plate (4 g/L glucose, 4 g/L yeast extract, 10 g/L malt extract, and 20 g/L agar) was used to compare general growth characteristics of the two strains. M1 plate was used to examine sporulation of the two strains. The two strains were grown on the solid media for 7 days.

Measurement of rapamycin concentration and packed mycelium volume (PMV)

During the flask cultivations, culture broth was sampled at 500 μL, and used for the measurement of rapamycin concentration. For this, culture broth was mixed with methanol in a 1:1 ratio, and vortexed for 30 min. The mixed solutions were subsequently centrifuged, and the supernatants were collected for the analysis using Waters 2695 (Waters, Milford, MA) high-performance liquid chromatography (HPLC) equipped with Agilent Eclipse XDB-C18 column (Agilent Technologies, Santa Clara, CA) and Waters 2487 detector (Waters, Milford, MA). In this HPLC analysis, water and acetonitrile were used as a mobile phase with ratio varied from 80:20 (v/v) to 20:80 (v/v) at a 1 mL/min flow rate, and 277 nm wavelength was used for the detector. Detected peaks were compared with a peak of the standard rapamycin compound (Sigma-Aldrich, St. Louis, MO) to measure the rapamycin concentration.

Packed mycelium volume (PMV) was used to estimate the growth of the NRRL 5491 and SRMK07 strains (Fig. 1b) because it was difficult to measure optical density or dry cell weight (DCW) from the insoluble media used for the rapamycin production51,52. For this, the collected culture broth (each 5 mL) was centrifuged at 3000 × g for 20 min. PMV was expressed as a percentage (%) by dividing PMV by the sample volume (5 mL).

Whole genome sequencing

For WGS, the wild-type and SRMK07 strain were incubated in tryptic soy broth (TSB) medium (17 g/L tryptone, 3 g/L soytone, 2.5 g/L glucose, 5 g/L NaCl, and 2.5 g/L K2HPO4) for 3–4 days, and gDNA samples of the two strains were extracted using Wizard Genomic DNA Purification Kit (Promega, Madison, WI). Next, WGS and genome annotation of the two strains were conducted at DNA Link, Inc. (Seoul, Korea) by using the PacBio (Pacific Biosciences, Menlo Park, CA) and Illumina (Illumina Inc., San Diego, CA) platforms together. To increase the genome sequence quality, the genome correction method suggested by Lee et al.53 was used; if more than 80% of Illumina reads for a specific genomic site conflict with the PacBio results, these sequences were substituted according to the Illumina results using CLC Genomics Workbench version 6.5.1 (CLC bio, Aarhus, Denmark).

Real-time PCR (qPCR) for verifying the potentially duplicated region in the SRMK07 strain’s genome

To verify the potentially duplicated region in the SRMK07 strain’s genome, five genes known to exist as a single copy in more than 100 Streptomyces species were selected based on OrthoDB28 (https://www.orthodb.org), and eight genes that represent the potentially duplicated region were selected based on our genome annotation results (Fig. 3). The qPCR experiments were conducted in accordance with the manufacturer’s protocol using gDNA of the two strains and SYBR Green PCR Master Mix (Thermo Fisher Scientific, Waltham, MA).

Analysis of biosynthetic gene clusters (BGCs) and core genes

BGCs and core genes of the wild-type and SRMK07 strain were analyzed using antiSMASH version 5.027 (http://antismash.secondarymetabolites.org) and ARTS (Antibiotic Resistant Target Seeker)31 version 2 (https://arts.ziemertlab.com), respectively. antiSMASH was implemented using the default options and ‘loose’ detection strictness. ARTS was also implemented with the default options with ‘Actinobacteria’ for ‘Reference set’.

Generation of draft genome-scale metabolic models (GEMs)

The draft GEMs that represent the metabolism of the NRRL 5491 and SRMK07 strains were generated using a Python-based GEM reconstruction tool that was previously released as a feature of antiSMASH 3.054. The Python-based GEM reconstruction tool requires protein sequences and their corresponding enzyme commission (EC) numbers for a target organism as well as a high-quality GEM of a biologically close organism in order to build a draft GEM. In this study, EC numbers for protein sequences from the wild-type and SRMK07 strain were predicted using DeepEC, a deep learning-based EC number prediction tool55. A high-quality GEM of S. coelicolor, iKS131756, was used as a template GEM. The resulting draft GEMs were generated in Systems Biology Markup Language (SBML), and further model refinement and simulations were implemented using COBRApy57.

Refinement of the draft GEMs

The draft GEM for the wild-type was first validated using the reported experimental growth data of S. rapamycinicus NRRL 5491 that involved 17 individual carbon sources and 19 individual nitrogen sources58 (Fig. 4c, Supplementary Tables S4, S5). Simulation of the draft GEM showed reasonably high accuracy, 78%, in comparison with the experimental growth data. Next, information on the rapamycin biosynthetic pathway, involving 14 condensation steps, a ring closure step, and post-polyketide synthase modification steps22, was added to the draft GEM. This rapamycin biosynthetic pathway was expressed as a single stoichiometric equation by implementing Biosynthetic Gene cluster Metabolic pathway Construction (BiGMeC), a pipeline that helps to create a metabolic pathway for a biosynthetic gene cluster encoding polyketides and non-ribosomal peptides59. Finally, MEMOTE (i.e., metabolic model tests)60 was implemented to evaluate the quality of the draft GEM. The same procedure was undertaken for the GEM representing the SRMK07 strain.

Prediction of intracellular metabolic flux distributions using the GEMs

Parsimonious flux balance analysis (pFBA) was implemented to predict intracellular metabolic flux distributions because of its robust predictive power61. For pFBA of SrapWT2040, rapamycin production rate of 5 × 10–4 mmol/g DCW/h and glycerol uptake rate of 0.8 mmol/g DCW/h were provided as constraints based on a previous study62. For SrapUV2010, rapamycin production rate and glycerol uptake rate were set to 2.0 × 10–3 mmol/g DCW/h and 1.0 mmol/g DCW/h, respectively; rapamycin production rate and glycerol uptake rate were adopted from our cultivation experiments (Fig. 1) and Wang et al.63.