Letter | Published:

Novel soil bacteria possess diverse genes for secondary metabolite biosynthesis

Naturevolume 558pages440444 (2018) | Download Citation


In soil ecosystems, microorganisms produce diverse secondary metabolites such as antibiotics, antifungals and siderophores that mediate communication, competition and interactions with other organisms and the environment1,2. Most known antibiotics are derived from a few culturable microbial taxa3, and the biosynthetic potential of the vast majority of bacteria in soil has rarely been investigated4. Here we reconstruct hundreds of near-complete genomes from grassland soil metagenomes and identify microorganisms from previously understudied phyla that encode diverse polyketide and nonribosomal peptide biosynthetic gene clusters that are divergent from well-studied clusters. These biosynthetic loci are encoded by newly identified members of the Acidobacteria, Verrucomicobia and Gemmatimonadetes, and the candidate phylum Rokubacteria. Bacteria from these groups are highly abundant in soils5,6,7, but have not previously been genomically linked to secondary metabolite production with confidence. In particular, large numbers of biosynthetic genes were characterized in newly identified members of the Acidobacteria, which is the most abundant bacterial phylum across soil biomes5. We identify two acidobacterial genomes from divergent lineages, each of which encodes an unusually large repertoire of biosynthetic genes with up to fifteen large polyketide and nonribosomal peptide biosynthetic loci per genome. To track gene expression of genes encoding polyketide synthases and nonribosomal peptide synthetases in the soil ecosystem that we studied, we sampled 120 time points in a microcosm manipulation experiment and, using metatranscriptomics, found that gene clusters were differentially co-expressed in response to environmental perturbations. Transcriptional co-expression networks for specific organisms associated biosynthetic genes with two-component systems, transcriptional activation, putative antimicrobial resistance and iron regulation, linking metabolite biosynthesis to processes of environmental sensing and ecological competition. We conclude that the biosynthetic potential of abundant and phylogenetically diverse soil microorganisms has previously been underestimated. These organisms may represent a source of natural products that can address needs for new antibiotics and other pharmaceutical compounds.

  • Subscribe to Nature for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Hibbing, M. E., Fuqua, C., Parsek, M. R. & Brook Peterson, S. Bacterial competition: surviving and thriving in the microbial jungle. Nat. Rev. Microbiol. 8, 15–25 (2010).

  2. 2.

    Charlop-Powers, Z., Owen, J. G., Reddy, B. V., Ternei, M. A. & Brady, S. F. Chemical–biogeographic survey of secondary metabolism in soil. Proc. Natl Acad. Sci. USA 111, 3757–3762 (2014).

  3. 3.

    Cragg, G. M. & Newman, D. J. Natural products: a continuing source of novel drug leads. Biochim. Biophys. Acta 1830, 3670–3695 (2013).

  4. 4.

    Rappé, M. S. & Giovannoni, S. J. The uncultured microbial majority. Annu. Rev. Microbiol. 57, 369–394 (2003).

  5. 5.

    Fierer, N. Embracing the unknown: disentangling the complexities of the soil microbiome. Nat. Rev. Microbiol. 15, 579–590 (2017).

  6. 6.

    Bergmann, G. T. et al. The under-recognized dominance of Verrucomicrobia in soil bacterial communities. Soil Biol. Biochem. 43, 1450–1455 (2011).

  7. 7.

    Kielak, A. M., Barreto, C. C., Kowalchuk, G. A., van Veen, J. A. & Kuramae, E. E. The ecology of Acidobacteria: moving beyond genes and genomes. Front. Microbiol. 7, 744 (2016).

  8. 8.

    Butterfield, C. N. et al. Proteogenomic analyses indicate bacterial methylotrophy and archaeal heterotrophy are prevalent below the grass root zone. PeerJ 4, e2687 (2016).

  9. 9.

    DeBruyn, J. M., Nixon, L. T., Fawaz, M. N., Johnson, A. M. & Radosevich, M. Global biogeography and quantitative seasonal dynamics of Gemmatimonadetes in soil. Appl. Environ. Microbiol. 77, 6295–6300 (2011).

  10. 10.

    Weber, T. et al. antiSMASH 3.0-a comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res. 43, W237–W243 (2015).

  11. 11.

    Medema, M. H. et al. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 39, W339–W346 (2011).

  12. 12.

    Hadjithomas, M. et al. IMG-ABC: a knowledge base to fuel discovery of biosynthetic gene clusters and novel secondary metabolites. MBio 6, e00932-e15 (2015).

  13. 13.

    Cimermancic, P. et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell 158, 412–421 (2014).

  14. 14.

    Wang, H., Fewer, D. P., Holm, L., Rouhiainen, L. & Sivonen, K. Atlas of nonribosomal peptide and polyketide biosynthetic pathways reveals common occurrence of nonmodular enzymes. Proc. Natl Acad. Sci. USA 111, 9259–9264 (2014).

  15. 15.

    Parsley, L. C. et al. Polyketide synthase pathways identified from a metagenomic library are derived from soil Acidobacteria. FEMS Microbiol. Ecol. 78, 176–187 (2011).

  16. 16.

    Rondon, M. R. et al. Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 66, 2541–2547 (2000).

  17. 17.

    Charlop-Powers, Z. et al. Global biogeographic sampling of bacterial secondary metabolism. eLife 4, e05048 (2015).

  18. 18.

    Fischbach, M. A. & Walsh, C. T. Assembly-line enzymology for polyketide and nonribosomal peptide antibiotics: logic, machinery, and mechanisms. Chem. Rev. 106, 3468–3496 (2006).

  19. 19.

    Medema, M. H. et al. Minimum information about a biosynthetic gene cluster. Nat. Chem. Biol. 11, 625–631 (2015).

  20. 20.

    Medema, M. H., et al. A systematic computational analysis of biosynthetic gene cluster evolution: lessons for engineering biosynthesis. PLoS Comput. Biol. 10, e1004016 (2014).

  21. 21.

    Thaker, M. N. et al. Identifying producers of antibacterial compounds by screening for antibiotic resistance. Nat. Biotechnol. 31, 922–927 (2013).

  22. 22.

    Johnston, C. W. et al. Assembly and clustering of natural antibiotics guides target identification. Nat. Chem. Biol. 12, 233–239 (2016).

  23. 23.

    Gibson, M. K., Forsberg, K. J. & Dantas G. Improved annotation of antibiotic resistance determinants reveals microbial resistomes cluster by ecology. ISME J. 9, 207–216 (2015).

  24. 24.

    Skinnider, M. A., Merwin, N. J., Johnston, C. W. & Magarvey, N. A. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes. Nucleic Acids Res. 45, W49–W54 (2017).

  25. 25.

    Koskiniemi, S. et al. Rhs proteins from diverse bacteria mediate intercellular competition. Proc. Natl Acad. Sci. USA 110, 7032–7037 (2013).

  26. 26.

    Claessen, D., de Jong, W., Dijkhuizen, L. & Wösten, H. A. Regulation of Streptomyces development: reach for the sky. Trends Microbiol. 14, 313–319 (2006).

  27. 27.

    Zhang, Y., Ducret, A., Shaevitz, J. & Mignot, T. From individual cell motility to collective behaviors: insights from a prokaryote, Myxococcus xanthus. FEMS Microbiol. Rev. 36, 149–164 (2012).

  28. 28.

    Wilson, M. C. et al. An environmental bacterial taxon with a large and distinct metabolic repertoire. Nature 506, 58–62 (2014).

  29. 29.

    Unger, S. et al. The influence of precipitation pulses on soil respiration–assessing the “Birch effect” by stable carbon isotopes. Soil Biol. Biochem. 42, 1800–1810 (2010).

  30. 30.

    Bray, N., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).

  31. 31.

    Klingenberg, H. & Meinicke, P. How to normalize metatranscriptomic count data for differential expression analysis. PeerJ 5, e3859 (2017).

  32. 32.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).

  33. 33.

    Langfelder, P & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).

  34. 34.

    Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).

  35. 35.

    Bérdy, J. Bioactive microbial metabolites. J. Antibiot. (Tokyo) 58, 1–26 (2005).

  36. 36.

    Bushnell, B. BBMap short read aligner. http://sourceforge.net/projects/bbmap (University of California, Berkeley, 2016).

  37. 37.

    Joshi, N. A. & Fass, J. N. sickle - a windowed adapative trimming tool for FastQ files (version 1.33) https://github.com/najoshi/sickle (2011).

  38. 38.

    Andrews, S. FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2010).

  39. 39.

    Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420–1428 (2012).

  40. 40.

    Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).

  41. 41.

    Brown, C.T. et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).

  42. 42.

    Wu, Y.-W., Simmons, B. A. & Singer, S. W. MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets. Bioinformatics 32, 605–607 (2016).

  43. 43.

    Alneberg, J. et al. Binning metagenomic contigs by coverage and composition. Nat. Methods 11, 1144–1146 (2014).

  44. 44.

    Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015).

  45. 45.

    Sieber, C. M. K. et al. Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nat. Methods https://doi.org/10.1038/s41564-018-0171-1 (2018).

  46. 46.

    Banfield, J. Development of a Knowledgebase to Integrate, Analyze, Distribute, and Visualize Microbial Community Systems Biology Data. Report No. DOE-UCB-4918) (US Department of Energy, 2015).

  47. 47.

    Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).

  48. 48.

    Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

  49. 49.

    Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

  50. 50.

    Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

  51. 51.

    Price, M. N., Dehal, P. S. and Arkin, A. P. FastTree 2–approximately maximum-likelihood trees for large alignments. PloS ONE 5, e9490 (2010).

  52. 52.

    Oksanen, J. et al. vegan: Community ecology package https://cran.r-project.org/package=vegan (2007).

  53. 53.

    Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

Download references


We thank S. Spaulding for assistance with fieldwork, and M. Traxler and W. Zhang for helpful discussions. Sequencing was carried out under a Community Sequencing Project at the Joint Genome Institute. Funding was provided by the Office of Science, Office of Biological and Environmental Research, of the US Department of Energy Grant DOE-SC10010566, the Paul G. Allen Family Foundation and the Innovative Genomics Institute of the University of California, Berkeley.

Author information


  1. Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, USA

    • Alexander Crits-Christoph
  2. The Innovative Genomics Institute, University of California, Berkeley, Berkeley, CA, USA

    • Alexander Crits-Christoph
    •  & Jillian F. Banfield
  3. Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, CA, USA

    • Spencer Diamond
    • , Cristina N. Butterfield
    • , Brian C. Thomas
    •  & Jillian F. Banfield
  4. Department of Environmental Science, Policy, and Management, University of California, Berkeley, Berkeley, CA, USA

    • Jillian F. Banfield
  5. Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA

    • Jillian F. Banfield


  1. Search for Alexander Crits-Christoph in:

  2. Search for Spencer Diamond in:

  3. Search for Cristina N. Butterfield in:

  4. Search for Brian C. Thomas in:

  5. Search for Jillian F. Banfield in:


A.C.-C. performed genomic and transcriptomic analysis; S.D. performed metagenome assembly and curation; C.N.B. performed microcosm experiments and RNA extractions; A.C.-C., S.D. and J.F.B. wrote the manuscript; B.C.T. supported the metagenomics bioinformatics work; and J.F.B. supervised the project.

Competing interests

The authors declare no competing interests.

Corresponding author

Correspondence to Jillian F. Banfield.

Extended data figures and tables

  1. Extended Data Fig. 1 Experimental plan and project overview.

    Schematic showing major components of microcosm time-point sampling and metagenomic analyses.

  2. Extended Data Fig. 2 NRPS and PKS biosynthetic loci of the Candidatus Eelbacter genome.

    Biosynthetic loci identified by both antiSMASH and PRISM from the Candidatus Eelbacter genome that contained at least 10 kb of biosynthetic genes. Predictions of the organization of the biosynthetic domains in each locus shown here were determined by PRISM. Smaller biosynthetic loci from this genome are not shown. Full names for the biosynthetic domains are given in Supplementary Table 11.

  3. Extended Data Fig. 3 NRPS and PKS biosynthetic loci of the Candidatus Angelobacter genome.

    Biosynthetic loci identified by both antiSMASH and PRISM from the Candidatus Angelobacter genome that contained at least 10 kb of biosynthetic genes. Predictions of the organization of the biosynthetic domains in each locus shown here were determined by PRISM. Smaller biosynthetic loci from this genome are not shown. Full names for the biosynthetic domains are given in Supplementary Table 11.

  4. Extended Data Fig. 4 Metatranscriptomics of NRPS and PKS proteins.

    The graph shows levels of transcriptional expression of genes containing NRPS and PKS protein domains across genomes from the four phyla of interest. Values are reported in log10-transformed transcripts per million and are summed across the 120 soil microcosm samples.

  5. Extended Data Fig. 5 Metatranscriptomics of the Candidatus Eelbacter genome.

    The levels of transcriptional expression of genes from biosynthetic gene clusters encoded in the Candidatus Eelbacter genome across 120 soil microcosm time-point samples grouped by extraction times (reported in hours) are shown. Expression levels are reported in log10-transformed transcripts per million.

  6. Extended Data Fig. 6 Differentially expressed biosynthetic gene clusters over time.

    The levels of expression of biosynthetic gene clusters from all organisms studied (excluding Candidatus Angelobacter data shown in Fig. 3a) that were found to be significantly differentially expressed between time points (PERMANOVA; n = 120; P < 0.05, FDR = 5%) across 120 soil microcosm time-point samples are shown. Expression levels are reported in log10 transcripts per million.

  7. Extended Data Fig. 7 Biosynthetic co-expression transcriptional module from Verrucomicrobia_AV7.

    A transcriptional network of co-expressed Verrucomicrobia_AV7 genes from a module found to be significantly enriched in genes from the biosynthetic gene clusters Verrucomicrobia_nrps_156 and Verrucomicrobia_nrps_157 (P < 0.05; hypergeometric distribution) is shown. Genes from the biosynthetic locus are outlined with a dashed line.

Supplementary information

  1. Supplementary Information

    This file contains a guide to Supplementary Tables 1-12.

  2. Reporting Summary

  3. Supplementary Tables

    This file contains Supplementary Tables 1-12 – see Supplementary Information document for full table legends.

About this article

Publication history






Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.