Background & Summary

Streptomyces species are Gram-positive microorganisms that play a significant role in the production of valuable secondary metabolites such as antibiotics, anticancer drugs, and pesticides. S. clavuligerus is a native producer of clavulanic acid (CA) and cephamycin and is used as an industrial strain for the production of these compounds1. CA, a widely used β-lactamase inhibitor, is a major active component in the extensively prescribed antibiotic, amoxicillin-containing AugmentinTM. The β-lactam moiety of amoxicillin covalently binds to the bacterial proteins responsible for crosslinking peptidoglycan precursors that are essential for bacterial cell-wall formation2. However, many pathogenic bacteria can easily acquire resistance against β-lactam antibiotics such as amoxicillin by expressing β-lactamases3. Notably, CA has no direct antibiotic activity but can inhibit β-lactamases and thus restore β-lactam (i.e. amoxicillin) sensitivity in β-lactamase expressing strains.

Random mutagenesis using ultraviolet (UV) irradiation or chemical mutagens is an efficient strategy for improving the production yields of desired secondary metabolites, including CA, in diverse bacterial strains4,5,6. Recently, we generated improved CA-producing mutant strains from wild type (WT) S. clavuligerus NRRL 3585 through UV irradiation-induced random mutagenesis or metabolic engineering (Fig. 1a). For example, the S. clavuligerus C1 mutant strain (C1) generated through UV irradiation-induced mutagenesis exhibited an approximately two-to-three-fold increase in CA production compared to that of the WT strain. In addition, oleic acid (OA) can be an optimal carbon source for CA production; however, high OA concentrations in culture conditions inhibit the growth of S. clavuligerus strains. When bacteria were again challenged with UV irradiation in presence of high OA levels, the OA-resistant S. clavuligerus OR strain (OR) was selected7. The OR strain produces about eight-fold as much CA as that of the WT strain8. Notably, upon subjecting the OR strain to further UV irradiation, a null S. clavuligerus mutant strain (NL) with complete loss of CA biosynthetic activity was obtained. Although UV irradiation-induced random mutagenesis effectively generates diverse mutant strains with varied CA biosynthetic potentials, it is difficult to explain the genetic and molecular mechanisms simply by whole-genome sequencing of the mutated strains because genetic mutations occur at multiple loci in the genome. Transcriptional profile studies have therefore been suggested to compensate for the limitations of whole-genome sequencing9,10.

Fig. 1
figure 1

Experimental design and workflow. (a) Schematic diagram of the strain development process for the four S. clavuligerus strains, WT, C1, OR, and NL. (b) Biomass parameters measured as viscosity and OD, and CA production of the four S. clavuligerus strains. Sampling time points for RNA extraction is indicated by gray dashed lines. (c) Workflow of RNA-seq and data analysis.

To understand the molecular mechanisms underlying the increased or decreased CA-producing capabilities, comparative transcriptional profiles of the four S. clavuligerus strains were analyzed. Genome-scale RNA-sequencing (RNA-seq) data for the four S. clavuligerus strains, WT, C1, OR, and NL strain were generated in this study (Fig. 1). RNA samples for each strain were extracted in biological duplicates at four different time points based on CA production: 8, 14, 36, and 72 h, where 8 and 14 h correspond to before and right after CA production begins, respectively, and 36 and 72 h to the time points where CA is actively produced. Consequently, we generated a RNA-seq dataset consisting of 32 samples, with sequence reads in the transcriptional samples ranging from 10.69 to 38.68 million. These data can provide valuable insights into novel methods for enhancing CA production ability of industrial strains through rational or targeted metabolic engineering strategies.

Methods

Strain, culture condition, and CA quantification

In this study, four strains of S. clavuligerus, namely WT, C1, OR, and NL, were used (Fig. 1a,b). S. clavuligerus NRRL 3585 was purchased from American Type Culture Collection. C1 and OR are UV irradiation-induced mutants of S. clavuligerus NRRL 3585 possessing a greater CA production ability than the WT. OR exhibits resistance to high OA concentrations, whereas C1 does not. The NL strain is a UV irradiation-induced mutant derived from the OR strain but lacks the ability to produce CA. Spore stocks of S. clavuligerus NRRL 3585 and its mutant strains, C1, OR, and NL, were maintained in 25% glycerol at −80 °C. The inoculation of Streptomyces spores and main incubation conditions followed the methodology outlined in a previous study8. In brief, spores of the S. clavuligerus strains were inoculated into 20 mL of seed medium in a 100 mL baffled flask containing 20 g/L of starch, 30 g/L of soy flour, 23 g/L of triolein, and 1.2 g/L of phosphate. Subsequently, 0.8 mL of the seed culture broth was transferred to a 50 mL production medium, which included 10 g/L of starch, 20 g/L of soy flour, 10.5 g/L of 4-morpholinepropanesulfonic acid, 23 g/L of triolein, and 1.2 g/L of phosphate, along with 1 mL of trace elements containing 3 g of iron (III) chloride hexahydrate, 0.5 g of copper (II) chloride dihydrate, 0.5 g of zinc chloride, and 0.5 g of manganese sulfate monohydrate in 1 L of distilled water, and 1 mL/L of antifoam in a 500 mL baffled Erlenmeyer flask. Cell growth was measured in terms of various biomass parameters due to complexities in the CA production medium11. Viscosity and optical density (OD) were used as biomass parameters to calculate specific cell growth as described in our previous studies (Fig. 1b)8. OD was measured at 600 nm. Viscosity was measured with viscometer equipped with a sample adaptor (Model DV-E, Ametek Brookfield, USA) as previously described8. Samples for RNA extraction were collected based on the CA production at four different time points (8, 14, 36, and 72 h). Two replicates were harvested at each time points to serve as biological replicates. For CA quantification, 0.1 mL of culture broth was mixed with 20 mM acetate buffer at pH 6.6. After removal of insoluble materials with 0.25 μm nylon syringe filter, it was analyzed with Waters 600 high performance liquid chromatography (HPLC) system equipped with a multisolvent delivery pump, a controller, and a photodiode array detector 2996 for CA detection (Waters 600 model, Waters, USA). The analysis was conducted with HypersilTM BDS C18 HPLC column, particle size of 5 μm with 4.6 × 250 mm dimensions (Thermo Fisher, USA), 14 min isocratic method using 16.6 mM NaH2PO4 in methanol (86:14, v/v), flow rate of 1 mL/min, and UV observance at 238 nm. CA standard was used for quantification of CA production. All reagents for which no specific vendor information has been provided were obtained from Sigma-Aldrich (USA).

RNA extraction

The cells were harvested and washed with polysome buffer (140 mM NaCl, 20 mM Tris-HCl pH 7.5, and 5 mM MgCl2). Cell pellets were resuspended in lysis buffer (0.3 M sodium acetate pH 5.2, 10 mM EDTA, and 1% Triton X-100). The cell suspension was frozen using liquid nitrogen and then subjected to physical lysis by grinding with a mortar and pestle. The supernatant collected by centrifugation of the cell lysate at 4 °C for 10 min at 16000 × g was either used for RNA extraction, or stored at −80 °C. The supernatants were mixed with equal volumes of phenol, chloroform, and isoamyl alcohol (25:24:1) and subjected to centrifugation. After centrifugation, RNA was extracted from the aqueous layer and precipitated using ethanol.

cDNA Library preparation and sequencing

The RNA samples were treated with DNase I (New England Biolabs, USA) to remove any DNA contamination. cDNA library preparation, quality assessment, and RNA sequencing were performed by Macrogen Inc. (Republic of Korea). A TruSeq stranded mRNA (Illumina, USA) preparation kit was used as per the manufacturer’s protocol, for cDNA library preparation. The cDNA library was sequenced using the Illumina NovaSeq6000 platform.

Data processing of RNA-Seq reads

For data analysis, Trimmomatic v0.39 was employed to eliminate low-quality reads from the raw sequencing data in single-end mode using Phred + 33 quality score encoding12. The following steps were applied: removing leading and trailing bases with a quality score 3 or less, and sliding window trimming with a window size of four bases. If the average quality within the window dropped below 15, bases within the window were removed. Reads below the minimum length threshold of 36 bases were discarded. To ensure quality control of the RNA-seq reads before and after trimming, FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) v0.11.9 was used. Subsequently, the resulting trimmed single-end reads were aligned to the reference genome using Bowtie2 (v.2.3.4.3) with the parameters “–D 20 –R 3 –N 0 –L 20 –i S,1,0.50”13. The reference genome accession numbers for S. clavuligerus are NZ_CP027858.1, for the chromosome, and NZ_CP027859.1 for the plasmid, respectively, and both are available at NCBI Assembly GCA_005519465.114. The mapped data were then processed to obtain expression count data using FeatureCounts v2.0.315.

Time-course gene expression changes and principal component analysis

The expression count data for each gene were normalized using the DESeq2 package (version 1.40.1) in R (version 4.3.0)16. A circos plot was generated using the Circlize package (version 0.4.15) in R (Fig. 2a)17. Genes in the reference genome were sorted based on their positions on the chromosome. Each bar plot was generated based on the mean normalized gene expression counts from duplicate samples. Notably, the circos plot suggested that large deletions in the plasmid may have occurred in the mutant strains C1, OR, and NL. Principal component analysis (PCA) was performed on the samples, using the plotPCA function with the parameter “ntop” set to 300 in DESeq2 (Fig. 2b).

Fig. 2
figure 2

(a) Circos plots were used to visualize the RNA-seq data. The outermost circle represents the reference linear chromosome (blue) and the plasmid (red). Numbers on the outermost circle indicate chromosomal positions in Mb. The inner four tracks display bar plots based on the mean normalized gene expression counts for each biological duplicate sample at different growth phases. (b) The results of PCA on the gene expression profiles of each sample are visualized.

Identification of differentially expressed genes

Differentially expressed genes (DEGs) in the three S. clavuligerus NRRL 3585 mutant strains were analyzed based on the WT transcriptome data (Fig. 3). The DEGs were identified using the DESeq2 library in R, with a false discovery rate (FDR) cutoff of <0.001 and a log fold expression change greater than 1 or less than −1. For each S. clavuligerus strain, the up-regulated and down-regulated DEGs compared to those of the wild-type, at corresponding time points were identified18.

Fig. 3
figure 3

The Venn diagram analysis depicts the comparison of up-regulated and down-regulated DEGs in RNA-seq results of each S. clavuligerus mutant strain at the corresponding timepoint compared to the WT. The DEGs were identified using DESeq2 with a FDR cutoff of <0.001 and an absolute value of minimum fold change of >118.

Data Records

All RNA-seq raw read FASTQ files were deposited in the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) under accession number SRP45395019. The DEGs for each comparison were deposited at the Figshare database18.

Technical Validation

Validation of RNA sequencing reads

A total of 32 RNA-seq libraries of S. clavuligerus NRRL 3585 and its mutant strains were generated. The data included NGS libraries generated at four different growth phases with biological duplicates (Fig. 1). FastQC was utilized for the quality assessment of Illumina RNA-seq reads, which indicated high sequencing quality, with an average of 99.16% of the trimmed reads remaining after quality scoring and nucleotide length trimming. After trimming, sequencing resulted in 10.60 to 38.43 million reads per library (Table 1). The RNA-seq samples had an average read length of approximately 100 bp (Fig. 4a). All RNA-seq data exhibited an average Phred score of 36 or higher, suggesting base-calling error probabilities lower than 10−3 at a minimum (Fig. 4b,c)20,21. Across all samples, an average of 96.9% of reads had a Phred score of 30 or higher. Based on these quality validation results, we confirmed the quality of all the obtained RNA sequencing reads prior to subsequent downstream analysis.

Table 1 Overall statistics of RNA-Seq data.
Fig. 4
figure 4

Technical Validations of RNA-seq samples from four different growth phases of four strains of S. clavuligerus. (a) The analysis of read length distribution was performed on trimmed reads obtained from RNA-seq samples of four different strains. For each growth phase, there were two replicates indicated by suffixes “_1” or “_2” after the time point. (b) The barplots depict the percentage of reads with a Phred score of 30 or higher for each RNA-seq sample. The average Phred scores of the reads after trimming were analyzed across the RNA-seq samples of the S. clavuligerus strains during various growth phases. (c) In the RNA-seq samples of four strains, the distribution of median Phred quality scores at each base position was examined. (d) The log2 normalized expression counts obtained from the DESeq2 package in the RNA-seq samples of the S. clavuligerus strains were visualized through violin and box plots.

Evaluation of transcriptome data

The reads were aligned to the reference genome, resulting in an average mapping rate of 74.24% for RNA-seq, indicating a substantial proportion of mapped gene reads. The distribution of log2 (DESeq normalized count +1) exhibited a wide range, from 0 to 18 (Fig. 4d). Transcriptome profile visualization revealed a distinct pattern highlighting the differences between the WT and randomly mutated strains (Fig. 2a). PCA was performed to ensure the reproducibility of the biological duplicates. Overall, the plots demonstrated high reproducibility across all replicates (Fig. 2b). The DEGs of the three mutant strains were compared to those of the WT strain to evaluate the utility of transcriptome data in unraveling the molecular mechanism underlying the high CA production ability (Fig. 3). Each randomly mutated strain with different CA biosynthetic ability exhibited both, common and distinct DEGs at various time points. This suggests that the comparative analysis of transcriptome data can potentially uncover efficient strategies for enhancing CA productivity thereby producing improved industrial strains.