Thirty complete Streptomyces genome sequences for mining novel secondary metabolite biosynthetic gene clusters

Lee, Namil; Kim, Woori; Hwang, Soonkyu; Lee, Yongjae; Cho, Suhyung; Palsson, Bernhard; Cho, Byung-Kwan

doi:10.1038/s41597-020-0395-9

Download PDF

Data Descriptor
Open access
Published: 13 February 2020

Thirty complete Streptomyces genome sequences for mining novel secondary metabolite biosynthetic gene clusters

Namil Lee¹^na1,
Woori Kim¹^na1,
Soonkyu Hwang¹,
Yongjae Lee¹,
Suhyung Cho¹,
Bernhard Palsson ORCID: orcid.org/0000-0003-2357-6785^3,4,5 &
…
Byung-Kwan Cho ORCID: orcid.org/0000-0003-4788-4184^1,2,5

Scientific Data volume 7, Article number: 55 (2020) Cite this article

10k Accesses
62 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Streptomyces are Gram-positive bacteria of significant industrial importance due to their ability to produce a wide range of antibiotics and bioactive secondary metabolites. Recent advances in genome mining have revealed that Streptomyces genomes possess a large number of unexplored silent secondary metabolite biosynthetic gene clusters (smBGCs). This indicates that Streptomyces genomes continue to be an invaluable source for new drug discovery. Here, we present high-quality genome sequences of 22 Streptomyces species and eight different Streptomyces venezuelae strains assembled by a hybrid strategy exploiting both long-read and short-read genome sequencing methods. The assembled genomes have more than 97.4% gene space completeness and total lengths ranging from 6.7 to 10.1 Mbp. Their annotation identified 7,000 protein coding genes, 20 rRNAs, and 68 tRNAs on average. In silico prediction of smBGCs identified a total of 922 clusters, including many clusters whose products are unknown. We anticipate that the availability of these genomes will accelerate discovery of novel secondary metabolites from Streptomyces and elucidate complex smBGC regulation.

Measurement(s)	DNA • genome • sequence_assembly • sequence feature annotation
Technology Type(s)	DNA sequencing • sequence assembly process • sequence annotation
Factor Type(s)	strain
Sample Characteristic - Organism	Streptomyces

Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.11791323

Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces bacteria

Article Open access 06 February 2020

Genome-scale determination of 5´ and 3´ boundaries of RNA transcripts in Streptomyces genomes

Article Open access 15 December 2020

Genome mining reveals novel biosynthetic gene clusters in entomopathogenic bacteria

Article Open access 25 November 2023

Background & Summary

With the rapid emergence of antibiotic microbial resistance (AMR) to all major classes of antibiotics and the decline in number of potential candidates for new antibiotics, there is a pressing need for the discovery of novel antibacterial compounds¹. Streptomyces, soil dwelling gram-positive bacteria, continue to be promising microorganisms for the production of clinically important secondary metabolites, including not only antibiotics, but also antiviral, antifungal, and antiparasitic agents, and antitumorals and immunosuppressant compounds². Streptomyces are distinguished by their complex life cycle and high G + C content (often over 70%) in their linear genomes^3,4. Traditionally, drug discovery from Streptomyces has been based on bioactivity screening followed by mass spectrometry and NMR-based molecular identification⁵. However, recent advances in genomics-based approaches revealed that most of the secondary metabolite biosynthetic gene clusters (smBGCs) of streptomycetes are inactive under laboratory conditions, suggesting that the ability of streptomycetes to produce secondary metabolites has been under-estimated^5,6. Each Streptomyces species has the genetic potential to produce more than 30 secondary metabolites on average, which are diverse and differ between species^7,8. Considering Streptomyces is the largest genus of actinobacteria with approximately 900 species characterized so far, streptomycetes are a valuable resource for the discovery of novel secondary metabolites⁹.

SmBGCs, especially polyketide and non-ribosomal peptide synthetase types, are often composed of extraordinarily long genes (>5 kb) encoding multi-modular enzymes with repetitive domain structures. Therefore, accurate gene annotations based on high quality genome sequences are essential for the precise identification of smBGCs¹⁰. Gene annotation with the high quality genome of S. clavuligerus revealed that 30% out of a total of 7,163 protein coding genes were incorrectly annotated in the previous draft genome of S. clavuligerus containing ambiguous and inaccurate nucleotides, indicating the importance of high quality genome sequences¹¹. In addition, high quality genome sequences are essential for multi-omics analysis, which facilitates the understanding of the complex regulation on smBGCs and rational engineering for increasing secondary metabolites production^11,12.

Among the 1,614 streptomycetes genomes that have been deposited in the NCBI Assembly database to date (as of 9th December 2019), only 189 and 35 assemblies were designated as complete genome level and chromosome level, respectively. More than 86% of assemblies were draft-quality genome sequences, which contain fragmented multiple contigs or ambiguous sequences^4,13,14,15. One of the main obstacles to obtaining high quality genomic information of streptomycetes is the low fidelity of sequencing techniques when dealing with high G w C genomes and frequently repetitive sequences such as terminal inverted repeats¹³. In addition, since streptomycetes have linear chromosome, it is difficult to confirm the completeness of the assembled chromosome.

In this study, we present the high-quality genome sequences of 30 streptomycetes, increasing the total number of reported complete Streptomyces genome by about 10%. The target streptomycetes were 22 Streptomyces type strains and eight different Streptomyces venezuelae strains, most of which are currently used as industrial strains for producing various bioactive compounds. We applied hybrid assembly strategy with long-read (PacBio) and short-read (Illumina) sequencing techniques to obtain complete genome sequences. PacBio sequencing provides long reads of several kb in length which allows the readthrough of regions with low complexity, enabling the assembly of repetitive regions, which are difficult to assemble by using Illumina sequencing reads, even with the high coverage data¹⁶. However, Illumina sequencing provides reads with a lower error rate compared to the PacBio sequencing, and assembled contigs based on the Illumina sequencing reads are not simply a subset of the contigs from PacBio sequencing reads^13,17. Therefore, reconciling PacBio and Illumina sequencing methods enables one to generate more complete genomes by overcoming the shortcomings of each method. During the genome assembly using reads from PacBio (0.46~5.18 Gbp) and Illumina (0.5~3.0 Gbp) sequencing, we constructed 6.7 to 10.1 Mbp of streptomycetes genomes, most of which consist of single chromosomes with 72% G + C contents on average. Inaccurate sequences in the assembled genome were corrected using Illumina sequencing reads. The complete streptomycetes genomes have more than 97.4% gene space completeness and on average 7,000 protein coding genes, 20 rRNAs, and 68 tRNAs were annotated. Finally, based on the complete genome sequences and annotations, we predicted a total of 922 smBGCs. The complete genome sequences and newly determined smBGCs in this study should prove to be a fundamental resource for understanding the genetic basis of streptomycetes and for discovering novel secondary metabolites.

Methods

Genomic DNA (gDNA) extraction

Total 30 streptomycetes were purchased from Korean Collection for Type Cultures (KCTC, Korea). A stock of streptomycetes were inoculated to 50 mL of liquid culture medium with 0.16 g mL⁻¹ of glass beads (3 ± 0.3 mm diameter) in 250 mL baffled flask and grown at 30 °C in a 200 rpm orbital shaker. Each streptomycetes was grown in one of four different culture medium, R5(–) medium (25 mM TES (pH 7.2), 103 g L⁻¹ sucrose, 1% (w/v) glucose, 5 g L⁻¹ yeast extract, 10.12 g L⁻¹ MgCl₂∙6H₂O, 0.25 g L⁻¹ K₂SO₄, 0.1 g L⁻¹ casamino acids, 0.08 g L⁻¹ ZnCl₂, 0.4 mg L⁻¹ FeCl₃, 0.02 mg L⁻¹ CuCl₂∙2H₂O, 0.02 mg L⁻¹ MnCl₂∙4H₂O, 0.02 mg L⁻¹ Na₂B₄O₇∙10H₂O, and 0.02 mg L⁻¹ (NH₄)₆Mo₇O₂₄∙4H₂O), 1 × sporulation medium (3.33 g L⁻¹ glucose, 1 g L⁻¹ yeast extract, 1 g L⁻¹ beef extract, 2 g L⁻¹ tryptose, and 0.006 g L⁻¹ FeSO₄∙7H₂O), YEME medium (340 g L⁻¹ sucrose, 10 g L⁻¹ glucose, 3 g L⁻¹ yeast extract, 5 g L⁻¹ bacto peptone, and 3 g L⁻¹ oxoid malt extract), and MYM medium (4 g L⁻¹ maltose, 4 g L⁻¹ yeast extract, 10 g L⁻¹ malt extract). For gDNA extraction, 25 mL cultured cells were harvested at the exponential growth phase and washed twice with same volume of 10 mM EDTA, followed by the lysozyme (10 mg mL⁻¹) treatment at 37 °C for 45 min. gDNA was extracted using a Wizard Genomic DNA Purification Kit (Promega, Madison, WI, USA) according to the manufacturer’s instruction. Quality and quantity of extracted gDNA samples were evaluated using 1% agarose gel electrophoresis and Nanodrop (Thermo Fisher Scientific, Waltham, MA, USA), respectively.

Short-read (Illumina) genome sequencing

For construction of short-read genome sequencing library, 2.5 μg of gDNA was sheared to approximately 350 bp by a Covaris instrument (Covaris Inc., Woburn, MA, USA) with the following conditions; Power 175, Duty factor 20%, C. burst 200, Time 23 s, 8 times. The library was constructed using a TruSeq DNA PCR-Free LT kit (Illumina Inc., San Diego, CA, USA) following manufacturer’s instruction. Briefly, the fragmented DNA samples were cleaned and end-repaired, followed by the adaptor ligation and bead-based size selection ranging from 400 to 500 bp. Quantity of final libraries was measured using Qubit® dsDNA HS Assay Kit (Thermo Fisher Scientific) and the library size was determined using Agilent 2200 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Among the constructed sequencing libraries, 29 libraries were sequenced with the HiSeq. 2500 (Illumina Inc.) as 100 bp single-end reads and remaining one library for S. tsukubaensis was sequenced with the Miseq v.2 (Illumina Inc.) with 50 bp single-read recipe. Finally, 0.46 to 5.18 Gbp of raw sequence data were obtained and the read qualities were examined by creating sequencing QC reports function of CLC genomic workbench version 6.5.1 (CLC bio, Denmark) (Online-only Table 1 and Fig. 1a).

Long-read (PacBio) genome sequencing

A total of 5 μg gDNA was used as input for PacBio genome sequencing library preparation. The sequencing library was constructed with the PacBio SMRTbell^TM Template Prep Kit (Pacific Biosciences, Menlo Park, CA, USA) following manufacturer’s instructions. Fragments smaller than 20 kbp were removed using the Blue Pippin Size selection system (Sage Science, Beverly, MA, USA) and the constructed libraries were validated using Agilent 2100 Bioanalyzer (Agilent Technologies). Final SMRTbell libraries were sequenced using one or two SMRT cells with P6-C4-chemistry (DNA Sequencing Reagent 4.0) on the PacBio RS II sequencing platform (Pacific Biosciences). Approximately, 0.5 to 3.0 Gbp of raw sequence data were generated (Online-only Table 1).

Genome assembly

Among the raw PacBio sequencing reads, only the reads with a read quality value greater than 0.75 and a length longer than 50 bp were filtered (Fig. 1b). Post filtered reads were assembled by the hierarchical genome assembly process workflow (HGAP, Version 2.3), including consensus polishing with Quiver¹⁸. For each assembled contig, error correction was performed based on their estimated genome size and average coverage. Raw reads from the Illumina sequencing were quality trimmed using CLC genomic workbench version 6.5.1 (ambiguous limit 2 and quality limit 0.05) and assembled using de novo assembly function of CLC genomic workbench version 6.5.1 with default parameters. To expand the assembled contigs, all of assembled PacBio and Illumina contigs were aligned using MAUVE 2.4.0¹⁹ and linked using GAP5 program (Staden package)²⁰.

Genome correction

Quality trimmed Illumina sequencing reads were mapped to the assembled genome using CLC genomic workbench version 6.5.1 (mismatch cost 2, insertion cost 3, deletion cost 3, length fraction 0.9, and similarity fraction 0.9). Conflicts showing more than 80% frequency for Illumina reads were corrected as Illumina sequence (Table 1). In addition, percentage of mapped Illumina reads on to the assembled genome represents degree of completeness (Table 1 and Fig. 2b). Completeness of gene space was estimated using the BUSCO v3 (Table 2)²¹.

Table 1 The statistics of genome assembly and correction.

Full size table

Table 2 Gene space completeness of completed genomes.

Full size table

Genome annotation and secondary metabolite biosynthetic gene cluster prediction

The complete genome sequences of streptomycetes were submitted to the NCBI GenBank database and annotated by the latest updated version of NCBI Prokaryotic Genome Annotation Pipeline (PGAP)²². Using the GenBank formatted files of each genomes as input, secondary metabolite biosynthetic gene clusters were predicted by antiSMASH 4.0²³.

Data Records

Raw reads from short-read (Illumina) and long-read (PacBio) sequencing were deposited in the NCBI Sequence Read Archive (SRA) (Online-only Table 1)^24,25. 30 complete genome sequences were deposited in GenBank via the NCBI’s submission portal (Table 3)^{26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55}. Detailed information on the predicted 922 smBGCs in 30 streptomycetes genomes has been deposited in FigShare⁵⁶.

Table 3 Summary of genome annotation.

Full size table

Technical Validation

Streptomyces have drawn considerable attention because of their ability to produce various clinically important secondary metabolites. Total 30 streptomycetes genomes were sequenced by using both PacBio and Illumina sequencing methods to elucidate their biosynthetic potential. After cleaning the reads, on average 98,380 PacBio reads with 11,725 bp length and 18,223,235 Illumina reads with 100 bp length (50 bp for S. tsukubaensis) were generated (Fig. 1a,b and Online-only Table 1). Through the assembly of reads from two sequencing platforms using HGAP, CLC workbench, MAUVE, and GAP5 programs, single linear scaffolds ranging from 6.7 to 10.1 Mbp in length with 72% G + C contents were obtained for 27 streptomycetes, whereas two scaffolds were finally constructed for three remaining streptomycetes, S. clavuligerus (6.7 and 1.8 Mbp), S. albofaciens (4.8 and 4.5 Mbp), and S. filamentosus (5.7 and 2.1 Mbp) (Table 1). S. clavuligerus has been reported to have a large linear plasmid with a length of 1.8 Mbp, so the genome was correctly assembled into a single chromosome, while the S. albofaciens and S. filamentosus genomes appear to be assembled into two divided scaffolds^11,57. To increase the accuracy of the assembled genome sequences, Illumina sequences showing more than 80% coverage at the conflict sites were taken as the corrected ones (Table 1). Approximately, 96.32% of Illumina sequencing reads were successfully mapped to the corresponding genomes (Table 1 and Fig. 2b). The completeness of the genomes were assessed using the BUSCO approach with a total of 352 orthologue groups from the Actinobacteria Dataset²¹. Results showed that 29 genomes have more than 99.1% gene space completeness and the S. clavuligerus genome has 97.4% gene space completeness (Table 2). Following NCBI PGAP, 30 genomes were annotated with 7,000 protein coding genes, 20 rRNAs, and 68 tRNAs on average (Table 3). Finally, based on the annotation, a total of 922 smBGCs were predicted in 30 streptomycetes genomes (Fig. 3). Detailed information, such as genomic positions, types, and putative products of each smBGC are publicly available in Figshare⁵⁶.

Code availability

The version and parameter of all bioinformatics tools used in this work are described in the Methods section.

References

Genilloud, O. The re-emerging role of microbial natural products in antibiotic discovery. Antonie Van Leeuwenhoek 106, 173–188 (2014).
Article CAS Google Scholar
Jones, S. E. & Elliot, M. A. Streptomyces exploration: Competition, volatile communication and new bacterial behaviours. Trends Microbiol 25, 522–531 (2017).
Article CAS Google Scholar
Hopwood, D. A. Soil to genomics: the Streptomyces chromosome. Annu Rev Genet 40, 1–23 (2006).
Article CAS Google Scholar
Lee, N. et al. Synthetic biology tools for novel secondary metabolite discovery in Streptomyces. J Microbiol Biotechnol 29, 667–686 (2019).
Article CAS Google Scholar
Ziemert, N., Alanjary, M. & Weber, T. The evolution of genome mining in microbes - a review. Nat Prod Rep 33, 988–1005 (2016).
Article CAS Google Scholar
Rebets, Y., Brotz, E., Tokovenko, B. & Luzhetskyy, A. Actinomycetes biosynthetic potential: how to bridge in silico and in vivo? J Ind Microbiol Biotechnol 41, 387–402 (2014).
Article CAS Google Scholar
Bentley, S. D. et al. Complete genome sequence of the model actinomycete Streptomyces coelicolor A3(2). Nature 417, 141–147 (2002).
Article ADS Google Scholar
Omura, S. et al. Genome sequence of an industrial microorganism Streptomyces avermitilis: deducing the ability of producing secondary metabolites. Proc Natl Acad Sci USA 98, 12215–12220 (2001).
Article ADS CAS Google Scholar
Nett, M., Ikeda, H. & Moore, B. S. Genomic basis for natural product biosynthetic diversity in the actinomycetes. Nat Prod Rep 26, 1362–1384 (2009).
Article CAS Google Scholar
Reva, O. & Tummler, B. Think big–giant genes in bacteria. Environ Microbiol 10, 768–777 (2008).
Article CAS Google Scholar
Hwang, S. et al. Primary transcriptome and translatome analysis determines transcriptional and translational regulatory elements encoded in the Streptomyces clavuligerus genome. Nucleic Acids Res 47, 6114–6129 (2019).
Article CAS Google Scholar
Li, Y., Zhang, C., Liu, C., Ju, J. & Ma, J. Genome sequencing of Streptomyces atratus SCSIOZH16 and activation production of nocardamine via metabolic engineering. Front Microbiol 9, 1269 (2018).
Article Google Scholar
Harrison, J. & Studholme, D. J. Recently published Streptomyces genome sequences. Microb Biotechnol 7, 373–380 (2014).
Article Google Scholar
Barreiro, C. et al. Draft genome of Streptomyces tsukubaensis NRRL 18488, the producer of the clinically important immunosuppressant tacrolimus (FK506). J Bacteriol 194, 3756–3757 (2012).
Article CAS Google Scholar
Song, J. Y. et al. Draft genome sequence of Streptomyces clavuligerus NRRL 3585, a producer of diverse secondary metabolites. J Bacteriol 192, 6317–6318 (2010).
Article CAS Google Scholar
Koren, S. et al. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol 14, R101 (2013).
Article Google Scholar
Ardui, S., Ameur, A., Vermeesch, J. R. & Hestand, M. S. Single molecule real-time (SMRT) sequencing comes of age: applications and utilities for medical diagnostics. Nucleic Acids Res 46, 2159–2168 (2018).
Article CAS Google Scholar
Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10, 563–569 (2013).
Article CAS Google Scholar
Darling, A. C., Mau, B., Blattner, F. R. & Perna, N. T. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res 14, 1394–1403 (2004).
Article CAS Google Scholar
Bonfield, J. K. & Whitwham, A. Gap5–editing the billion fragment sequence assembly. Bioinformatics 26, 1699–1703 (2010).
Article CAS Google Scholar
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article CAS Google Scholar
Haft, D. H. et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res 46, D851–D860 (2018).
Article CAS Google Scholar
Blin, K. et al. antiSMASH 4.0-improvements in chemistry prediction and gene cluster boundary identification. Nucleic Acids Res 45, W36–W41 (2017).
Article CAS Google Scholar
Leinonen, R., Sugawara, H. & Shumway, M. & International Nucleotide Sequence Database, C. The sequence read archive. Nucleic Acids Res 39, D19–21 (2011).
Article CAS Google Scholar
NCBI Sequence Read Archive. https://identifiers.org/ncbi/insdc.sra:SRP200324 (2019).
GenBank. https://identifiers.org/ncbi/insdc:CP020700 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023688 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023689 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023690 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023691 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023692 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023693 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023694 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023695 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023696 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023697 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023698 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023699 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023700 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023701 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023702 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023703 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP023747 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029189 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029190 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029191 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029192 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029193 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029194 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029195 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029196 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP029197 (2018).
GenBank. https://identifiers.org/ncbi/insdc:PDCL00000000 (2018).
GenBank. https://identifiers.org/ncbi/insdc:PDCM00000000 (2018).
GenBank. https://identifiers.org/ncbi/insdc:CP027858 (2019).
Lee, N. et al. Thirty complete Streptomyces genome sequences for mining novel secondary metabolite biosynthetic gene clusters. figshare. https://doi.org/10.6084/m9.figshare.c.4823394 (2020).
Medema, M. H. et al. The sequence of a 1.8-Mb bacterial linear plasmid reveals a rich evolutionary reservoir of secondary metabolic pathways. Genome Biol Evol 2, 212–224 (2010).
Article Google Scholar

Download references

Acknowledgements

This work was supported by a grant from the Novo Nordisk Foundation (grant number NNF10CC1016517). This work was also supported by the Bio & Medical Technology Development Program (2018M3A9F3079664 to B.-K.C.) through the National Research Foundation (NRF) funded by the Ministry of Science and ICT (MSIT).

Author information

These authors contributed equally: Namil Lee and Woori Kim.

Authors and Affiliations

Department of Biological Sciences and KI for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon, 34141, Republic of Korea
Namil Lee, Woori Kim, Soonkyu Hwang, Yongjae Lee, Suhyung Cho & Byung-Kwan Cho
Intelligent Synthetic Biology Center, Daejeon, 34141, Republic of Korea
Byung-Kwan Cho
Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
Bernhard Palsson
Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA
Bernhard Palsson
Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, 2800, Denmark
Bernhard Palsson & Byung-Kwan Cho

Authors

Namil Lee
View author publications
You can also search for this author in PubMed Google Scholar
Woori Kim
View author publications
You can also search for this author in PubMed Google Scholar
Soonkyu Hwang
View author publications
You can also search for this author in PubMed Google Scholar
Yongjae Lee
View author publications
You can also search for this author in PubMed Google Scholar
Suhyung Cho
View author publications
You can also search for this author in PubMed Google Scholar
Bernhard Palsson
View author publications
You can also search for this author in PubMed Google Scholar
Byung-Kwan Cho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

B.-K.C. conceived and supervised the study. N.L. and B.-K.C. designed the experiments. N.L., W.K., S.H. and Y.L. performed the experiments. N.L., W.K., S.H., Y.L., S.C., B.P. and B.-K.C. analyzed the data. N.L., W.K., S.C., B.P. and B.-K.C. wrote the manuscript.

Corresponding author

Correspondence to Byung-Kwan Cho.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Online-only Table

Online-only Table 1 Summary of PacBio and Illumina genome sequencing data for 30 streptomycetes.

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.

Reprints and permissions

About this article

Cite this article

Lee, N., Kim, W., Hwang, S. et al. Thirty complete Streptomyces genome sequences for mining novel secondary metabolite biosynthetic gene clusters. Sci Data 7, 55 (2020). https://doi.org/10.1038/s41597-020-0395-9

Download citation

Received: 11 September 2019
Accepted: 24 January 2020
Published: 13 February 2020
DOI: https://doi.org/10.1038/s41597-020-0395-9

This article is cited by

Genomes of four Streptomyces strains reveal insights into putative new species and pathogenicity of scab-causing organisms
- Laura Henao
- Ramin Shirali Hossein Zade
- Thomas Abeel
BMC Genomics (2023)
Genome-based classification of Streptomyces anatolicus sp. nov., an actinobacterium with antimicrobial and cytotoxic activities, and reclassification of Streptomyces nashvillensis as a later heterotypic synonym of Streptomyces tanashiensis
- Hilal Ates
- Hayrettin Saygin
- Hilal Ay
Antonie van Leeuwenhoek (2023)
Genome-scale analysis of genetic regulatory elements in Streptomyces avermitilis MA-4680 using transcript boundary information
- Yongjae Lee
- Namil Lee
- Byung-Kwan Cho
BMC Genomics (2022)
Whole-genome sequencing and analysis of Streptomyces strains producing multiple antinematode drugs
- Jeong Sang Yi
- Jung Min Kim
- Yeo Joon Yoon
BMC Genomics (2022)
Comparative genomic analysis of Streptomyces rapamycinicus NRRL 5491 and its mutant overproducing rapamycin
- Hee-Geun Jo
- Joshua Julio Adidjaja
- Min-Kyu Oh
Scientific Reports (2022)