Increased diversity of peptidic natural products revealed by modification-tolerant database search of mass spectra

  • Nature Microbiologyvolume 3pages319327 (2018)
  • doi:10.1038/s41564-017-0094-2
  • Download Citation


Peptidic natural products (PNPs) include many antibiotics and other bioactive compounds. While the recent launch of the Global Natural Products Social (GNPS) molecular networking infrastructure is transforming PNP discovery into a high-throughput technology, PNP identification algorithms are needed to realize the potential of the GNPS project. GNPS relies on the assumption that each connected component of a molecular network (representing related metabolites) illuminates the ‘dark matter of metabolomics’ as long as it contains a known metabolite present in a database. We reveal a surprising diversity of PNPs produced by related bacteria and show that, contrary to the ‘comparative metabolomics’ assumption, two related bacteria are unlikely to produce identical PNPs (even though they are likely to produce similar PNPs). Since this observation undermines the utility of GNPS, we developed a PNP identification tool, VarQuest, that illuminates the connected components in a molecular network even if they do not contain known PNPs and only contain their variants. VarQuest reveals an order of magnitude more PNP variants than all previous PNP discovery efforts and demonstrates that GNPS already contains spectra from 41% of the currently known PNP families. The enormous diversity of PNPs suggests that biosynthetic gene clusters in various microorganisms constantly evolve to generate a unique spectrum of PNP variants that differ from PNPs in other species.

  • Subscribe to Nature Microbiology for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Ling, L. L. et al. A new antibiotic kills pathogens without detectable resistance. Nature 517, 455–459 (2015).

  2. 2.

    Wang, M. et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34, 828–837 (2016).

  3. 3.

    Marahiel, M. A., Stachelhaus, T. & Mootz, H. D. Modular peptide synthetases involved in nonribosomal peptide synthesis. Chem. Rev. 97, 2651–2674 (1997).

  4. 4.

    Arnison, P. G. et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30, 108–160 (2013).

  5. 5.

    Stachelhaus, T., Mootz, H. D., Bergendahl, V. & Marahiel, M. A. Peptide bond formation in nonribosomal peptide biosynthesis. Catalytic role of the condensation domain. J. Biol. Chem. 273, 22773–22781 (1998).

  6. 6.

    Von Dohren, H., Dieckmann, R. & Pavela-Vrancic, M. The nonribosomal code. Chem. Biol. 6, R273–R279 (1999).

  7. 7.

    Mohimani, H. et al. Automated genome mining of ribosomal peptide natural products. Acs. Chem. Biol. 9, 1545–1551 (2014).

  8. 8.

    Ng, J. et al. Dereplication and de novo sequencing of nonribosomal peptides. Nat. Methods 6, 596–599 (2009).

  9. 9.

    Ibrahim, A. et al. Dereplicating nonribosomal peptides using an informatic search algorithm for natural products (iSNAP) discovery. Proc. Natl Acad. Sci. USA 109, 19196–19201 (2012).

  10. 10.

    Mohimani, H. & Pevzner, P. A. Dereplication, sequencing and identification of peptidic natural products: from genome mining to peptidogenomics to spectral networks. Nat. Prod. Rep. 33, 73–86 (2016).

  11. 11.

    Mohimani, H. et al. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 13, 30–37 (2017).

  12. 12.

    Pevzner, P. A., Mulyukov, Z., Dancik, V. & Tang, C. L. Efficiency of database search for identification of mutated and modified proteins via mass spectrometry. Genome Res. 11, 290–299 (2001).

  13. 13.

    Tsur, D., Tanner, S., Zandi, E., Bafna, V. & Pevzner, P. A. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol. 23, 1562–1567 (2005).

  14. 14.

    Tanner, S. et al. InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639 (2005).

  15. 15.

    Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).

  16. 16.

    Balkovec, J. M. et al. Discovery and development of first in class antifungal caspofungin (CANCIDAS®)—a case study. Nat. Prod. Rep. 31, 15–34 (2014).

  17. 17.

    Okano, A., Isley, N. & Boger, D. L. Peripheral modifications of vancomycin with added synergistic mechanisms of action provide durable and potent antibiotics. Proc. Natl Acad. Sci. USA 114, 5052–5061 (2017).

  18. 18.

    Mohimani, H. et al. Multiplex de novo sequencing of peptide antibiotics. J. Comput. Biol. 18, 1371–1381 (2011).

  19. 19.

    Bandeira, N. Spectral networks: a new approach to de novo discovery of protein sequences and posttranslational modifications. Biotechniques 42, 687–695 (2007).

  20. 20.

    Navarro, G. et al. Image-based 384-well high-throughput screening method for the discovery of skyllamycins A to C as biofilm inhibitors and inducers of biofilm detachment in Pseudomonas aeruginosa. Antimicrob. Agents Ch. 58, 1092–1099 (2014).

  21. 21.

    Yates, J. R., Eng, J. K., McCormack, A. L. & Schieltz, D. Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 (1995).

  22. 22.

    Pevzner, P. A., Dancik, V. & Tang, C. L. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7, 777–787 (2000).

  23. 23.

    Na, S., Bandeira, N. & Paek, E. Fast multi-blind modification search through tandem mass spectrometry. Mol. Cell. Proteom. 11, M111.010199 (2012).

  24. 24.

    Mohimani, H., Kim, S. & Pevzner, P. A. A new approach to evaluating statistical significance of spectral identifications. J. Proteome Res. 12, 1560–1568 (2013).

  25. 25.

    Nguyen, D. D. et al. Indexing the Pseudomonas specialized metabolome enabled the discovery of poaeamide B and the bananamides. Nat. Microbiol. 2, 16197 (2016).

  26. 26.

    Duncan, K. R. et al. Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species. Chem. Biol. 22, 460–471 (2015).

  27. 27.

    Luzzatto-Knaan, T. et al. Digitizing mass spectrometry data to explore the chemical diversity and distribution of marine cyanobacteria and algae. eLife 6, e24214 (2017).

  28. 28.

    Blunt, J., Munro, M. & Laatsch, H. AntiMarin Database (Univ. Canterbury, Christchurch, and Univ. Gottingen, Gottingen, 2007);

  29. 29.

    Gozalbes, R. & Pineda-Lucena, A. Small molecule databases and chemical descriptors useful in chemoinformatics: an overview. Comb. Chem. High T. Scr. 14, 548–458 (2011).

  30. 30.

    Medema, M. H. et al. Minimum information about a biosynthetic gene cluster. Nat. Chem. Biol. 11, 625–631 (2015).

  31. 31.

    Lucas, X. et al. StreptomeDB: a resource for natural compounds isolated from Streptomyces species. Nucleic Acids Res. 41, D1130–D1136 (2013).

  32. 32.

    Challis, G. L. & Naismith, J. H. Structural aspects of non-ribosomal peptide biosynthesis. Curr. Opin. Struc. Biol. 14, 748–756 (2004).

  33. 33.

    Schmidt, E. W. The hidden diversity of ribosomal peptide natural products. BMC Biol. 8, 83 (2010).

  34. 34.

    Hadjithomas, M. et al. IMG-ABC: new features for bacterial secondary metabolism analysis and targeted biosynthetic gene cluster discovery in thousands of microbial genomes. Nucleic Acids Res. 45, D560–D565 (2017).

  35. 35.

    Medema, M. H. et al. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 39, W339–W346 (2011).

  36. 36.

    Gerard, J. et al. Massetolides A-H, antimycobacterial cyclic depsipeptides produced by two pseudomonads isolated from marine habitats. J. Nat. Prod. 60, 223–229 (1997).

  37. 37.

    Takada, K. et al. Surugamides A-E, cyclic octapeptides with four D-amino acid residues, from a marine Streptomyces sp.: LC-MS-aided inspection of partial hydrolysates for the distinction of D - and L -amino acid residues in the sequence. J. Org. Chem. 78, 6746–6750 (2013).

  38. 38.

    Kodani, S., Sato, K., Hemmi, H. & Ohnish-Kameyama, M. Isolation and structural determination of a new hydrophobic peptide venepeptide from Streptomyces venezuelae. J. Antibiot. 67, 839–842 (2014).

  39. 39.

    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  40. 40.

    Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl Acad. Sci. USA 109, E1743–E1752 (2012).

  41. 41.

    Mukherjee, S. et al. 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nat. Biotechnol. 35, 676–683 (2017).

  42. 42.

    Mohimani, H. et al. Cycloquest: identification of cyclopeptides via database search of their mass spectra against genome databases. J. Proteome Res. 10, 4505–4512 (2011).

  43. 43.

    Mohimani, H. et al. NRPquest: coupling mass spectrometry and genome mining for nonribosomal peptide discovery. J. Nat. Prod. 77, 1902–1909 (2014).

  44. 44.

    Rottig, M. et al. NRPSpredictor2—a web server for predicting NRPS adenylation domain specificity. Nucleic Acids Res. 39, W362–W367 (2011).

  45. 45.

    Da Silva, R. R., Dorrestein, P. C. & Quinn, R. A. Illuminating the dark matter in metabolomics. Proc. Natl Acad. Sci. USA 112, 12549–12550 (2015).

  46. 46.

    Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

  47. 47.

    Djoumbou Feunang, Y. et al. ClassyFire: automated chemical classification with a comprehensive, computable taxonomy. J. Cheminformatics 8, 61 (2016).

  48. 48.

    Smith, C. A. et al. METLIN: a metabolite mass spectral database. Ther. Drug. Monit. 27, 747–751 (2005).

Download references


We thank K. Vyatkina for fruitful discussions and A. Prjibelski for help with manuscript preparation. The work of A.G., A.M., A.S., A.K. and P.A.P. was supported by the Russian Science Foundation (grant 14-50-00069). The work of H.M. and P.A.P. was supported by the US National Institutes of Health (grant 2-P41-GM103484).

Author information


  1. Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia

    • Alexey Gurevich
    • , Alla Mikheenko
    • , Alexander Shlemov
    • , Anton Korobeynikov
    •  & Pavel A. Pevzner
  2. Department of Mathematics and Mechanics, Saint Petersburg State University, Saint Petersburg, Russia

    • Anton Korobeynikov
  3. Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA

    • Hosein Mohimani
    •  & Pavel A. Pevzner
  4. Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA

    • Hosein Mohimani


  1. Search for Alexey Gurevich in:

  2. Search for Alla Mikheenko in:

  3. Search for Alexander Shlemov in:

  4. Search for Anton Korobeynikov in:

  5. Search for Hosein Mohimani in:

  6. Search for Pavel A. Pevzner in:


A.G. implemented the VarQuest algorithm. A.S. and A.K. improved and sped up the DEREPLICATOR software. A.G., A.M. and H.M. designed the webserver. A.G. and A.M. did the VarQuest benchmarking. H.M. and P.A.P. designed and directed the work. A.G., H.M. and P.A.P. wrote the manuscript.

Competing interests

P.A.P. has an equity interest in Digital Proteomics—a company that may potentially benefit from the research results. The terms of this arrangement have been reviewed and approved by the University of California, San Diego in accordance with its conflict of interest policies

Corresponding author

Correspondence to Pavel A. Pevzner.

Supplementary information

  1. Supplementary Information

    Supplementary Tables 1–18, Supplementary Figures 1–9 and Supplementary References.

  2. Life Sciences Reporting Summary