Diversity of nonribosomal peptide synthetase and polyketide synthase gene clusters among taxonomically close Streptomyces strains

To identify the species of butyrolactol-producing Streptomyces strain TP-A0882, whole genome-sequencing of three type strains in a close taxonomic relationship was performed. In silico DNA-DNA hybridization using the genome sequences suggested that Streptomyces sp. TP-A0882 is classified as Streptomyces diastaticus subsp. ardesiacus. Strain TP-A0882, S. diastaticus subsp. ardesiacus NBRC 15402T, Streptomyces coelicoflavus NBRC 15399T, and Streptomyces rubrogriseus NBRC 15455T harbor at least 14, 14, 10, and 12 biosynthetic gene clusters (BGCs), respectively, coding for nonribosomal peptide synthetases (NRPSs) and polyketide synthases (PKSs). All 14 gene clusters were shared by S. diastaticus subsp. ardesiacus strains TP-A0882 and NBRC 15402T, while only four gene clusters were shared by the three distinct species. Although BGCs for bacteriocin, ectoine, indole, melanine, siderophores such as deferrioxamine, terpenes such as albaflavenone, hopene, carotenoid and geosmin are shared by the three species, many BGCs for secondary metabolites such as butyrolactone, lantipeptides, oligosaccharide, some terpenes are species-specific. These results indicate the possibility that strains belonging to the same species possess the same set of secondary metabolite-biosynthetic pathways, whereas strains belonging to distinct species have species-specific pathways, in addition to some common pathways, even if the strains are taxonomically close.

A large number of bioactive secondary metabolites have been found from actinomycetes 1,2 . In past years, each secondary metabolite producer was taxonomically identified at the species level based on morphological, cultural, physiological and chemical features. Consequently, correlation data between each species and its secondary metabolites are steadily being accumulated. For example, Streptomyces griseus, Streptomyces avermitilis and Streptomyces tsukubensis are well known to produce streptomycin, avermectin and tacrolimus, respectively [3][4][5] . However, taxonomic position of producing strains of new secondary metabolites are usually determined at the genus level based on their 16S rRNA gene sequences, while species-level assignment is not always done in the field of natural product research. Although species-level classification of secondary metabolite producers gives crucial information for researchers who are seeking new microbial compounds, relationship between species names and secondary metabolites is unclear for most cases.
Genome analyses of actinomycetes are revealing that various biosynthetic gene clusters (BGCs) for secondary metabolites are encoded in their genomes and about half to three quarters of the clusters are associated with nonribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) pathways 6 , which suggests that nonribosomal peptides, polyketides and their hybrid compounds are the major secondary metabolites of actinomycetes. These compounds often show pharmaceutically useful bioactivities, and many have been developed into various drugs such as antibiotics, anticancer agents, and immunosuppressants. Hence, recently, genome analysis focused Scientific REPORTS | (2018) 8:6888 | DOI: 10.1038/s41598-018-24921-y on NRPS and PKS gene clusters is often employed to evaluate actinomycete strains for their ability of secondary metabolite production [7][8][9][10] .
A marine-derived Streptomyces sp. TP-A0882 produces butyrolactol 11 . We recently identified the gene clusters responsible for butyrolactol and thiazostatin biosynthesis in this strain using whole genome analysis 12 . In the present study, we sequenced the genomes of three type strains taxonomically closely related to strain TP-A0882, and conducted in silico DNA-DNA hybridization (DDH) to identify this strain at the species level. We further analyzed secondary metabolite-BGCs (smBGCs) such as NRPS and PKS gene clusters in each of the genomes to elucidate the diversity of secondary metabolite-biosynthetic pathways among the taxonomically close species and provide information useful for researchers screening Streptomyces strains for new compounds.

Results
Taxonomic identification of butyrolactol-producing Streptomyces sp. TP-A0882. The Tables 3 and 4. The number and types of gene clusters are same as those of Streptomyces sp. TP-A0882 and the sequences show >99% amino acid sequence identity to those of Streptomyces sp. TP-A0882 (NBRC 110030) based on BLAST analysis in all cases except ORF77-1 and ORF80-1 ( Table 4). The structures of predicted products of the gene clusters from NBRC 15402 T also matched those of TP-A0882. These results suggested that the two S. diastaticus subsp. ardesiacus strains contain identical NRPS and PKS pathways.
S. coelicoflavus NBRC 15399 T harbors four nrps clusters, two pks/nrps clusters, three t2pks clusters, and one t3pks cluster, as shown in Table 5. Unlike typical Streptomyces strains, t1pks cluster is not present in this strain. nrps-i, nrps-ii, pks/nrps-i, t2pks-i, and t3pks-i were predicted to be responsible for the synthesis of coelibactin, coelichelin, prodiginine, gray spore pigment, and tetrahydroxynaphthalene (THN), respectively, based on high similarities (85-99% amino acid sequence identity) to SCO7681-7683, SCO0492 (CchH), SCO5886-SCO5894 (Red), SCO5318-SCO5316 (WhiE), and SCO1206 (RppA) of Streptomyces coelicolor A3(2) 6,13 , respectively. Based on the domain and module organizations and substrate selective residues in the A domains, nrps-iii and nrps-iv were predicted to synthesize nonribosomal peptides consisting of eight amino acids and 13 amino acids, respectively. The product of pks/nrps-ii was speculated to be a novel oxazolomycin analog because the domain organization is similar, but not identical, to that of the BGCs for oxazolomycins 14 . Although the remaining two gene clusters (t2pks-ii, t2pks-iii) are likely to be responsible for the synthesis of aromatic polyketides, the structures were not predicted from the sequence information alone. Analysis of the genome sequence of S. coelicoflavus strain ZG0656, the only S. coelicoflavus strain of which genome sequence is published 15 , indicated that all of the S. coelicoflavus NBRC 15399 T gene clusters (Table 5) are present also in strain ZG0656 with >97% amino acid sequence identity based on BLAST comparisons. S. rubrogriseus NBRC 15455 T harbors four nrps clusters, one pks/nrps cluster, at least three t1pks clusters, two t2pks clusters, and two t3pks clusters (Table 6). nrps-a, nrps-b, nrps-c, pks/nrps-a, t1pks-a, t1pks-b, t2pks-a, t3pks-a, and t3pks-b were predicted to be responsible for the synthesis of coelibactin, coelichelin, calcium-dependent antibiotic (CDA), prodiginine, coelimycin, eicosapentaenoic acid, gray spore pigment, THN, and phenolic acid, respectively, based on high similarities (91-100% amino acid sequence identities) to SCO7681-7683, SCO0492 (CchH), SCO3230-SCO3032 (CDA peptide synthetases), SCO5886-SCO5894 (Red), SCO6275-SCO6273 (Cpk),    Table 3. Numbers of secondary metabolite-biosynthetic gene clusters (smBGCs) encoded in each genome. a As some type-I PKS gene clusters were not completely sequenced, exact numbers are unclear. b Not detected.   but their predicted PKS proteins do not have high sequence similarity to the known PKS proteins, suggesting that the product(s) might be novel. t2pks-b is likely to synthesize aromatic polyketides, but the products could not be predicted because the sequence does not show a high degree of similarity to any PKS whose products have been elucidated. Among the 12 gene clusters, all except the other t1pks genes and t2pks-b show >93% sequence similarity to the corresponding genes from S. coelicolor A3(2), suggesting that most of the gene clusters in S. rubrogriseus NBRC 15455 T are present also in S. coelicolor A3(2).

Conservation of NRPS and PKS gene clusters among taxonomically close species.
As summarized in Fig. 1a

The other secondary metabolite-biosynthetic gene clusters. In addition to NRPS and PKS gene
clusters, the other smBGCs were also investigated. Thirteen to 18 gene clusters are encoded in each genome as shown in Table 3. Table 7 lists the clusters with putative products and loci. Homologous gene clusters are aligned in the same row in the  Fig. 1b).

Discussion
Genome analysis conducted in this study shows that S. diastaticus subsp. ardesiacus strains TP-A0882 and NBRC 15402 T share an almost identical set of smBGCs, while S. coelicoflavus strains NBRC 15399 T and ZG0656 shared their own similar set of gene clusters. Previous studies on Nocardia brasiliensis 8 and Salinispora species 16 have also shown that most smBGCs are common within each species, with strain-specific ones being relatively limited. These results suggest that actinomycete strains belonging to the same species are also likely to possess similar secondary metabolite biosynthetic pathways.
In contrast, only a limited number of smBGC are shared by different species examined in this study, even though they have >99% 16S rRNA gene sequence similarity and are thus considered taxonomically close. We identified totally 49 different smBGCs including 25 NRPS and PKS gene clusters from the three species. Among them, 14 clusters, responsible for production of coelibactin, coelichelin, gray spore pigment, THN, bacteriocin, ectoine, indole, melanin, two types of NRPS-independent siderophres, and four types of terpenes are conserved among the three species, while additional five clusters for phenolic lipid, prodiginine, nonribosomal peptide, lantipeptide, and terpene syntheses are shared by two species. Coelibactin and coelichelin are iron-chelating molecules, known as siderophores, that are involved in uptake of ferric iron 17 . Like gray spore pigment and melanin, THN is involved in pigmentation, as it is used in melanin formation 18 . Pigment production is often examined in taxonomic studies 19 . Phenolic lipids are components of the cell wall, and are involved in resistance to β-lactam antibiotics by affecting the characteristics and rigidity of the cytoplasmic membrane/peptidoglycan 20 . Ectoine is an osmolyte and involved in protection against extreme osmotic stress 21 . Therefore, many of the conserved/ shared gene clusters identified in this study are physiologically and/or taxonomically important. The remaining 33 smBGCs are species-specific, with each of the three species containing different eleven specific clusters.
Unexpectedly, most of the gene clusters in S. rubrogriseus NBRC 15455 T are present also in S. coelicolor (correctly classified as Streptomyces violaceoruber) 22 A3(2). As the sequence similarities in these regions are very high (>93%), we considered it possible that strains NBRC 15455 T and A3(2) might actually be the same species. To clarify this, we conducted in silico DDH analysis of the two genome sequences. The resulting estimated DDH value is 70.3% (67.3-73.2%), which is just on the borderline between two strains belonging to the same or different species, and the probability that the value exceeds 70% was calculated to be 78.9% (data not shown). Orthologs of the other t1pks cluster(s) and t2pks-b found in S. rubrogriseus NBRC 15455 T (Table 6) were not identified in S. coelicolor A3(2), while orthologs of SCO5073-SCO5092 (actinorhodin), SCO6826-SCO6827, SCO7669-SCO7671 (aromatic polyketide), SCO7221 (germicidin), SCP1.228c-SCP1.246 (methylenomycin), SCO0381-SCO0401, and SCO7700-SCO7701 (2-methylisoborneol) present in S. coelicolor A3(2), could not be shown as "…". b Parentheses indicate that the closest homolog is not from Streptomyces sp. TP-A0882 (NBRC 110030). c Encoded on the complementary strand. d Although homologs in Streptomyces sp. TP-A0882 did not appear as high score hits in basic local alignment search tool analyses because they are not registered in GenBank, they are present in scaffolds 13 (BBOK01000009), 22

nrps-iv
Gly-y-Asp-Tyl-Thr-x-Asp-Gly-Pro-Gly-Gly-Ala-mGly     Here, we have shown an example that actinomycetes strains belonging to the same species share a conserved set of smBGCs, whereas different species each harbor species-specific smBGCs in addition to some common ones even if the species are taxonomically close. Relationships between species and smBGCs in actinomycetes were reported by Doroghazi et al. 24 , Ziemert et al. 16 , and Seipke et al. 25 . As the study by Doroghazi et al. is a large-scale analysis for taxonomically diverse 840 actinobacterial strains encompassing many genera, they did not compare smBGCs between taxonomically close Streptomyces species. Ziemert et al. reported the diversity and evolution of PKS and NRPS gene clusters within the genus Salinispora. In contrast to rare actinomycetes such as Salinispora, relationships between species and smBGCs are less well elucidated in the genus Streptomyces. Seipke et al. showed strain-level diversity of smBGCs in S. albus. However, the strains were actually not S. albus 23 and may not belong to a single species but be divided into two independent genomospecies whose in silico DDH value is less 70% (our unpublished data). As the genus Streptomyces includes many species, accumulation of data for more Streptomyces species is needed to clarify whether smBGCs are diverse at strain-level or conserved at species-level. As reported here, genome sequence-based analysis will provide more insight into relationships between Streptomyces species and their secondary metabolites. Streptomyces Table 7. Loci encoding the other smBGCs in the draft genome sequences. a When the outputs of antiSMASH showed >40% gene similarities, we putatively considered them as putative products; b Locus is shown as start-end positions and scaffold no. (sxx means scaffold000xx); c As analysis using antiSMASH output product names but the gene similarities were less 40% gene similarity, the products are shown as unidentified; d Not detected; e No output; f AVLINLDhb(didehydrobutyrine)DDGCGDha(didehydroalanine) DhbCDhaDhaPCADhbNVA & CNGDhaCADhbNVA in S. diastaticus subsp. ardesiacus TP-A0882, DhaDGGCGDhaDhbCGNACIDhaDhaGDha, INLDhbDDGCGDhaDhbCDhaDhaPCADhbNVA & CKGDhaCADhbNVA in S. coelicoflavus NBRC 15399 T ; g Core peptide amino acid sequence predicted by antiSMASH; h based on the similarity to BGCs for giosmin.