Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking

Journal name:
Nature Biotechnology
Volume:
34,
Pages:
828–837
Year published:
DOI:
doi:10.1038/nbt.3597
Received
Accepted
Published online

Abstract

The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.

At a glance

Figures

  1. Overview of GNPS.
    Figure 1: Overview of GNPS.

    (a) Representation of interactions among the NP community, GNPS spectral libraries, and GNPS data sets. At present 221,083 MS/MS spectra from 18,163 unique compounds are used for searches in GNPS. These include both third-party libraries, such as MassBank, ReSpect, and NIST, as well as spectral libraries created for GNPS (GNPS-Collections) and spectra from the NP community (GNPS-Community). GNPS spectral libraries grow through user contributions of new identifications of MS/MS spectra. To date, 55 community members have contributed 8,853 MS/MS spectra from 5,568 unique compounds (30.5% of the unique compounds available). In addition, ongoing curation efforts have already yielded 563 annotation updates for library spectra. The utility of these libraries is to dereplicate compounds (recognition of previously characterized and studied known compounds), in both public and private data. This dereplication process is performed on all public data sets and results are automatically reported, thus enabling users to query all data sets, organisms, and conditions. Automatic reanalysis of all public data creates a virtuous cycle in which contributions to libraries can be matched to all public data. Combined with molecular networking (Fig. 3), this automatic reanalysis empowers community members to identify analogs that can then be added to GNPS spectral libraries. (b) The GNPS platform has grown to serve a global user base of >9,200 users from 100 countries.

  2. GNPS spectral libraries.
    Figure 2: GNPS spectral libraries.

    (a) The computational resources of the metabolomics and the NP community fall into two main categories: first, reference collections (red dots) of MS/MS spectral libraries; and second, data repositories (blue dots) designed to publicly share raw MS data associated with research projects. Reference collection resources are contributors and aggregators of reference MS/MS spectra, some of which also include data analysis tools, for example, online multi-spectrum MS/MS search (magnifying glass icon). Several resources have aggregated MS/MS spectra from various reference collections so that the analysis tools at a respective resource can leverage more of the community efforts to annotate data (red and blue arrows). GNPS has imported all freely available reference collections (>221,000 MS/MS spectra) and makes them available for online analyses. GNPS and several other resources provide both reference MS/MS spectra and data in an open and free manner to the public (pink caps). (b) Comparison of spectral library sizes of available libraries (MassBank, ReSpect, and NIST) and GNPS libraries; GNPS-Collections includes newly acquired spectra from synthetic or purified compounds and GNPS-Community includes all community-contributed spectra. (c) Searching all public GNPS data sets revealed that MassBank, ReSpect, and NIST libraries matched to 1,217 unique compounds, with GNPS libraries increasing unique compound matches by 41% (corresponding to 29% of total unique matches) with an accompanying 4% increase in spectral library size. Overall, GNPS libraries increase the total number of spectra matched in public data sets by 144% (59% of total public MS/MS matches), and spectra matches across all GNPS public and private data by 767% (88% of all MS/MS matches). (d) The distribution of precursor masses in all GNPS public data sets is shown in gray and compared to the precursor mass distributions of MassBank, ReSpect, NIST, and GNPS libraries (color key as in b). Though GNPS libraries have a combined size that is smaller than MassBank, ReSpect, and NIST, GNPS libraries have a higher proportion of molecules in the higher m/z range and therefore complement the proportionately lower precursor mass molecules in other libraries. (e) The quality of spectrum matches obtained by searching against the available spectral libraries is assessed by user ratings (1 to 4 stars; Supplementary Table 6) of continuous identification results. User ratings of >2.5 stars for >98% of GNPS library matches compares favorably with the 90% mark for NIST matches, whose high marks demonstrate how important these third-party libraries still are to the GNPS platform. We note that the lower mark for NIST matches does not suggest lower-quality spectra. It is more likely explained by its higher emphasis on lower precursor mass molecules with spectra that have fewer peaks and are generally harder to match.

  3. Molecular network creation and visualization.
    Figure 3: Molecular network creation and visualization.

    (a) Molecular networks are constructed from the alignment of MS/MS spectra to one another. Edges connecting nodes (MS/MS spectra) are defined by a modified cosine scoring scheme that determines the similarity of two MS/MS spectra with scores ranging from 0 (totally dissimilar) to 1 (completely identical). MS/MS spectra are also searched against GNPS spectral libraries, seeding putative node matches in the molecular networks. Networks are visualized online in-browser or exported for third-party visualization software such as Cytoscape31. (b) An example alignment between three MS/MS spectra of compounds with structural modifications that are captured by modification-tolerant spectral matching used in variable dereplication and molecular networking. (c) In-browser molecular network visualization enables users to interactively explore molecular networks without requiring any external software. To date, >11,000 molecular networks have been analyzed using this feature. Within this interface, (i) users are able to define cohorts of input data and correspondingly, nodes within the network are represented as pie charts to visualize spectral count differences for each molecule across cohorts. (ii) Node labels indicate matches made to GNPS spectral libraries, with additional information displayed with mouseovers. These matches provide users a starting point to annotate unidentified MS/MS spectra within the network. (iii) To facilitate identification of unknowns, users can display MS/MS spectra in the right panels by clicking on the nodes in the network, giving direct interactive access to the underlying MS/MS peak data. Furthermore, alignments between spectra are visualized between spectra in the top right and bottom right panels to gain insight as to what underlying characteristics of the molecule could elicit fragmentation perturbations.

  4. 'Living data' in GNPS by crowdsourcing molecular annotations.
    Figure 4: 'Living data' in GNPS by crowdsourcing molecular annotations.

    (a) A global snapshot of the state of MS/MS matching of public NP data sets available in GNPS using molecular networking and library search tools. Identified molecules (1.9% of the data) are MS/MS spectrum matches to library spectra with a cosine >0.7. Putative analog molecules (another 1.9% of the data) are MS/MS spectra that are not identified by library search but rather are immediate neighbors of identified MS/MS spectra in molecular networks. Identified Networks (9.9% of the data) are connected components within a molecular network that have at least one spectrum match to library spectra. Unidentified networks (25.2% of the data) are molecular networks where none of the spectra match to library spectra; these networks potentially represent compound classes that have not yet been characterized. Exploratory networks (an additional 20.1% of the data) are unidentified connected components in molecular networks with more relaxed parameters (Supplementary Table 8). Thus, 55.3% of the MS/MS spectra at least have one related MS/MS spectrum in spectral networks, with 44.7% having none. In this 44.7% of the data, each MS/MS spectrum has been observed in two separate instances and should not constitute noise. Altogether, this analysis indicates that most of the chemical space captured by MS remains unexplored. (b) In the past year, there has been substantial growth in the GNPS spectral libraries, driving an increase in the match rates of all public data. The number of unique compounds matched in the public data has increased tenfold; the number of total spectra matched has increased 22-fold; and the average match rate has increased threefold. It is expected that identification rates will continue to grow with further contributions from the community to the GNPS-Community spectral library.

  5. GNPS enabled discovery of stenothricin.
    Figure 5: GNPS enabled discovery of stenothricin.

    (a) The stenothricin molecular family was identified during analysis of a molecular network between chemical extracts of S. roseosporus NRRL 15998 (green) and Streptomyces sp. DSM5940 (blue). This analysis indicates that Streptomyces sp. DSM5940 produces a structurally similar compound to stenothricin with a −41 Da m/z difference. An enlarged version of the network can be found in Supplementary Figure 8. (b) Based on preliminary structural analysis, stenothricin-GNPS (41 Da) may contain a Lys to Ser substitution. (c) Comparison of the MS/MS of stenothricin D with stenothricin-GNPS 2. (d) Although structurally related, stenothricin and stenothricin-GNPS have different effects on E. coli as visualized using fluorescence microscopy. Red is the membrane stain FM4-64, blue is the membrane-permeable DNA stain DAPI (4′,6-diamidino-2-phenylindole), green is the membrane impermeable DNA stain SYTOX green. SYTOX green stains DNA only when the cell membrane is damaged. Scale bar, 2 μm.

References

  1. Bouslimani, A., Sanchez, L.M., Garg, N. & Dorrestein, P.C. Mass spectrometry of natural products: current, emerging and future technologies. Nat. Prod. Rep. 31, 718729 (2014).
  2. Dictionary of Natural Products http://dnp.chemnetbase.com/ (2013).
  3. Laatsch, H. AntiBase 2012: The Natural Compound Identifiers (Wiley-VCH, 2011).
  4. Blunt, J. & Munro, M. MarinLit: a database of the marine natural products literature http://pubs.rsc.org/marinlit/ (Department Chem. Univ. Canterbury, Canterbury, New Zealand) (2003).
  5. Hisayuki, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703714 (2010).
  6. Smith, C.A. et al. METLIN: a metabolite mass spectral database. Ther. Drug Monit. 27, 747751 (2005).
  7. mzCloud: advanced mass spectral database https://www.mzcloud.org/.
  8. Sawada, Y. et al. RIKEN tandem mass spectral database (ReSpect) for phytochemicals: a plant-specific MS/MS-based data resource and database. Phytochemistry 82, 3844 (2012).
  9. Benson, D.A. et al. GenBank. Nucleic Acids Res. 41, D36D42 (2013).
  10. Magrane, M. & UniProt Consortium. UniProt Knowledgebase: a hub of integrated protein data. Database 2011, bar009 (2011).
  11. Lang, G. et al. Evolving trends in the dereplication of natural product extracts: new methodology for rapid, small-scale investigation of natural product extracts. J. Nat. Prod. 71, 15951599 (2008).
  12. Ito, T. & Masubuchi, M. Dereplication of microbial extracts and related analytical technologies. J. Antibiot. (Tokyo) 67, 353360 (2014).
  13. Little, J.L., Williams, A.J., Pshenichnov, A. & Tkachenko, V. Identification of “known unknowns” utilizing accurate mass data and ChemSpider. J. Am. Soc. Mass Spectrom. 23, 179185 (2012).
  14. Moree, W.J. et al. Interkingdom metabolic transformations captured by microbial imaging mass spectrometry. Proc. Natl. Acad. Sci. USA 109, 1381113816 (2012).
  15. Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl. Acad. Sci. USA 109, E1743E1752 (2012).
  16. Nguyen, D.D. et al. MS/MS networking guided analysis of molecule and gene cluster families. Proc. Natl. Acad. Sci. USA 110, E2611E2620 (2013).
  17. Sidebottom, A.M., Johnson, A.R., Karty, J.A., Trader, D.J. & Carlson, E.E. Integrated metabolomics approach facilitates discovery of an unpredicted natural product suite from Streptomyces coelicolor M145. A.C.S. Chem. Biol. 8, 20092016 (2013).
  18. Vizcaino, M.I., Engel, P., Trautman, E. & Crawford, J.M. Comparative metabolomics and structural characterizations illuminate colibactin pathway-dependent small molecules. J. Am. Chem. Soc. 136, 92449247 (2014).
  19. Wilson, M.C. et al. An environmental bacterial taxon with a large and distinct metabolic repertoire. Nature 506, 5862 (2014).
  20. Engel, P., Vizcaino, M.I. & Crawford, J.M. Gut symbionts from distinct hosts exhibit genotoxic activity via divergent colibactin biosynthesis pathways. Appl. Environ. Microbiol. 81, 15021512 (2015).
  21. Yang, J.Y. et al. Molecular networking as a dereplication strategy. J. Nat. Prod. 76, 16861699 (2013).
  22. The National Institute of Standards and Technology. NIST standard reference database 1A http://www.nist.gov/srd/nist1a.cfm.
  23. Pruitt, K.D., Tatusova, T., Brown, G.R. & Maglott, D.R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130D135 (2012).
  24. Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 4548 (2000).
  25. Bairoch, A. et al. The Universal Protein Resource (UniProt). Nucleic Acids Res. 33, D154D159 (2005).
  26. Kersten, R.D. et al. Glycogenomics as a mass spectrometry-guided genome-mining method for microbial glycosylated molecules. Proc. Natl. Acad. Sci. USA 110, E4407E4416 (2013).
  27. Guthals, A., Watrous, J.D., Dorrestein, P.C. & Bandeira, N. The spectral networks paradigm in high throughput mass spectrometry. Mol. Biosyst. 8, 25352544 (2012).
  28. Mascuch, S.J. et al. Direct detection of fungal siderophores on bats with white-nose syndrome via fluorescence microscopy-guided ambient ionization mass spectrometry. PLoS One 10, e0119668 (2015).
  29. Bandeira, N., Tsur, D., Frank, A. & Pevzner, P. Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. USA 104, 61406145 (2007).
  30. Winnikoff, J.R., Glukhov, E., Watrous, J., Dorrestein, P.C. & Gerwick, W.H. Quantitative molecular networking to profile marine cyanobacterial metabolomes. J. Antibiot. (Tokyo) 67, 105112 (2014).
  31. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 24982504 (2003).
  32. Kildgaard, S. et al. Accurate dereplication of bioactive secondary metabolites from marine-derived fungi by UHPLC-DAD-QTOFMS and a MS/HRMS library. Mar. Drugs 12, 36813705 (2014).
  33. Matsuda, F. et al. AtMetExpress development: a phytochemical atlas of Arabidopsis development. Plant Physiol. 152, 566578 (2010).
  34. Haug, K. et al. MetaboLights--an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res. 41, D781D786 (2013).
  35. Martens, L. et al. PRIDE: the proteomics identifications database. Proteomics 5, 35373545 (2005).
  36. Uchida, K. & Zähner, H. Metabolic products of microorganisms 142. A new antibiotic derinamycin, inhibitor of DNA and RNA synthesis. J. Antibiot. (Tokyo) 28, 266273 (1975).
  37. Liu, W.-T. et al. MS/MS-based networking and peptidogenomics guided genome mining revealed the stenothricin gene cluster in Streptomyces roseosporus. J. Antibiot. (Tokyo) 67, 99104 (2014).
  38. Marfey, P. Determination of D-amino acids. II. Use of a bifunctional reagent, 1,5-difluoro-2,4-dinitrobenzene. Carlsberg Res. Commun. 49, 591596 (1984).
  39. Nonejuie, P., Burkart, M., Pogliano, K. & Pogliano, J. Bacterial cytological profiling rapidly identifies the cellular pathways targeted by antibacterial molecules. Proc. Natl. Acad. Sci. USA 110, 1616916174 (2013).
  40. Lamsa, A., Liu, W.T., Dorrestein, P.C. & Pogliano, K. The Bacillus subtilis cannibalism toxin SDP collapses the proton motive force and induces autolysis. Mol. Microbiol. 84, 486500 (2012).
  41. Purves, K. et al. Using molecular networking for microbial secondary metabolite bioprospecting. Metabolites 6, 2 (2016).
  42. Bertin, M.J. et al. Spongosine production by a Vibrio harveyi strain associated with the sponge Tectitethya crypta. J. Nat. Prod. 78, 493499 (2015).
  43. Boudreau, P.D. et al. Expanding the described metabolome of the marine cyanobacterium Moorea producens JHB through orthogonal natural products workflows. PLoS One 10, e0133297 (2015).
  44. Kleigrewe, K. et al. Combining mass spectrometric metabolic profiling with genomic analysis: a powerful approach for discovering natural products from cyanobacteria. J. Nat. Prod. 78, 16711682 (2015).
  45. Duncan, K.R. et al. Molecular networking and pattern-based genome mining improves discovery of biosynthetic gene clusters and their products from Salinispora species. Chem. Biol. 22, 460471 (2015).
  46. Vizcaino, M.I. & Crawford, J.M. The colibactin warhead crosslinks DNA. Nat. Chem. 7, 411417 (2015).
  47. Klitgaard, A., Nielsen, J.B., Frandsen, R.J.N., Andersen, M.R. & Nielsen, K.F. Combining stable isotope labeling and molecular networking for biosynthetic pathway characterization. Anal. Chem. 87, 65206526 (2015).
  48. Anderton, C.R., Chu, R.K., Tolilc´, N., Creissen, A. & Paša-Tolic´, L. Utilizing a robotic sprayer for high lateral and mass resolution MALDI FT-ICR MSI of microbial cultures. J. Am. Soc. Mass Spectrom. 27, 556559 (2016).
  49. Liaimer, A. et al. Nostopeptolide plays a governing role during cellular differentiation of the symbiotic cyanobacterium Nostoc punctiforme. Proc. Natl. Acad. Sci. USA 112, 18621867 (2015).
  50. Liu, Y. et al. Diversity of aquatic pseudomonas species and their activity against the fish pathogenic oomycete saprolegnia. PLoS One 10, e0136241 (2015).
  51. He, X. et al. Cultivation of a human-associated TM7 phylotype reveals a reduced genome and epibiotic parasitic lifestyle. Proc. Natl. Acad. Sci. USA 112, 244249 (2015).
  52. Cha, J.-Y. et al. Microbial and biochemical basis of a Fusarium wilt-suppressive soil. ISME J. 10, 119129 (2016).
  53. Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Searching molecular structure databases with tandem mass spectra using CSI:FingerID. Proc. Natl. Acad. Sci. USA 112, 1258012585 (2015).
  54. Berman, H.M. et al. The Protein Data Bank. Nucleic Acids Res. 28, 235242 (2000).
  55. Wishart, D.S. et al. HMDB: The human metabolome database. Nucleic Acids Res. 35, D521D526 (2007).
  56. Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 44, D463D470 (2016).

Download references

Author information

  1. These authors contributed equally to this work.

    • Mingxun Wang,
    • Jeremy J Carver,
    • Vanessa V Phelan,
    • Laura M Sanchez,
    • Neha Garg &
    • Yao Peng

Affiliations

  1. Computer Science and Engineering, University of California (UC) San Diego, La Jolla, California, USA.

    • Mingxun Wang,
    • Jeremy J Carver &
    • Pavel Pevzner
  2. Center for Computational Mass Spectrometry, UC San Diego, La Jolla, California, USA.

    • Mingxun Wang,
    • Jeremy J Carver,
    • Pavel Pevzner,
    • Hosein Mohimani &
    • Nuno Bandeira
  3. Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, California, USA.

    • Vanessa V Phelan,
    • Laura M Sanchez,
    • Neha Garg,
    • Jeramie Watrous,
    • Tal Luzzatto-Knaan,
    • Carla Porto,
    • Amina Bouslimani,
    • Alexey V Melnik,
    • Michael J Meehan,
    • Laura A Pace,
    • David J Gonzalez,
    • Nobuhiro Koyama,
    • Kathleen Dorrestein,
    • Brendan M Duggan,
    • Jehad Almaliti,
    • William H Gerwick,
    • Bradley S Moore,
    • Pieter C Dorrestein &
    • Nuno Bandeira
  4. Department of Chemistry and Biochemistry, UC San Diego, La Jolla, California, USA.

    • Yao Peng,
    • Don Duy Nguyen,
    • Clifford A Kapono,
    • Cheng-Chih Hsu,
    • Dimitrios J Floros &
    • Yi Zeng
  5. Department of Microbiology and Immunology, Stanford University, Palo Alto, California, USA.

    • Wei-Ting Liu
  6. Center for Marine Biotechnology and Biomedicine, Scripps Institute of Oceanography, UC San Diego, La Jolla, California, USA.

    • Max Crüsemann,
    • Paul D Boudreau,
    • Katherine R Duncan,
    • Karin Kleigrewe,
    • Lena Gerwick,
    • Charles B Larson,
    • Ellis C O'Neill,
    • Enora Briand,
    • Evgenia Glukhov,
    • Jenan J Kharbush,
    • Samantha J Mascuch,
    • Paul R Jensen,
    • William H Gerwick,
    • Bradley S Moore &
    • Pieter C Dorrestein
  7. Sirenas Marine Discovery, San Diego, California, USA.

    • Eduardo Esquenazi,
    • Egle Pociute,
    • Hailey Houson,
    • Lisa Vuong &
    • Venkat Macherla
  8. Centro de Ciencias Genómicas, Universidad Nacional Autonoma de Mexico, Cuernavaca, Mexico.

    • Mario Sandoval-Calderón &
    • Christian Sohlenkamp
  9. Salk Institute, Salk Institute, La Jolla, California, USA.

    • Roland D Kersten
  10. Biology Department, San Diego State University, San Diego, California, USA.

    • Robert A Quinn
  11. Scottish Association for Marine Science, Scottish Marine Institute, Oban, UK.

    • Katherine R Duncan
  12. Center for Drug Discovery and Biodiversity, INDICASAT, City of Knowledge, Panama.

    • Ronnie G Gavilan,
    • Brian E Sedio,
    • Cristopher A Boya P,
    • Daniel Torres-Mendoza &
    • Marcelino Gutiérrez
  13. Genome Dynamics, Lawrence Berkeley National Laboratory, Berkeley, California, USA.

    • Trent Northen &
    • Stefan Jenkins
  14. FAS Center for Systems Biology, Harvard, Cambridge, Massachusetts, USA.

    • Rachel J Dutton
  15. Produits naturels – Synthèses – Chimie Médicinale, University of Rennes 1, Rennes Cedex, France.

    • Delphine Parrot &
    • Sophie Tomasi
  16. Department of Chemistry, University of Minnesota, Minneapolis, Minnesota, USA.

    • Erin E Carlson
  17. Dynamique des Génomes et Adaptation Microbienne, University of Lorraine, Vandœuvre-lès-Nancy, France.

    • Bertrand Aigle
  18. Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.

    • Charlotte F Michelsen,
    • Lars Jelsbak,
    • Maria Maansson,
    • Andreas Klitgaard &
    • Kristian Fog Nielsen
  19. Microbial and Environmental Genomics, J. Craig Venter Institute, La Jolla, California, USA.

    • Anna Edlund
  20. School of Dentistry, UC Los Angeles, Los Angeles, California, USA.

    • Anna Edlund,
    • Jeffrey McLean &
    • Wenyuan Shi
  21. Department of Periodontics, University of Washington, Seattle, Washington, USA.

    • Jeffrey McLean
  22. Institute of Microbiology, ETH Zurich, Zurich, Switzerland.

    • Jörn Piel,
    • Eric J N Helfrich,
    • Florian Ryffel &
    • Julia A Vorholt
  23. Department of Medicinal Chemistry and Pharmacognosy, University of Illinois Chicago, Chicago, Illinois, USA.

    • Brian T Murphy &
    • Maryam Elfeki
  24. Department of Marine Biotechnology and Resources, National Sun Yat-sen University, Kaohsiung, Taiwan.

    • Chih-Chuang Liaw
  25. Agricultural Biotechnology Research Center, Academia Sinica, Taipei, Taiwan.

    • Yu-Liang Yang
  26. Institute of Food Chemistry, University of Münster, Münster, Germany.

    • Hans-Ulrich Humpf
  27. School of Chemical & Physical Sciences, and Centre for Biodiscovery, Victoria University of Wellington, Wellington, New Zealand.

    • Robert A Keyzers
  28. Gillings School of Global Public Health, Department of Epidemiology, University of North Carolina Chapel Hill, Chapel Hill, North Carolina, USA.

    • Amy C Sims &
    • Ralph Baric
  29. Department of Chemistry, Indiana University, Bloomington, Indiana, USA.

    • Andrew R Johnson &
    • Ashley M Sidebottom
  30. Smithsonian Tropical Research Institute, Ancón, Panama.

    • Brian E Sedio
  31. Skaggs School of Pharmacy and Pharmaceutical Sciences, UC San Diego, La Jolla, California, USA.

    • Charles B Larson,
    • David J Gonzalez,
    • Pieter C Dorrestein &
    • Nuno Bandeira
  32. School of Pharmaceutical Sciences of Ribeirao Preto, University of São Paulo, São Paulo, Brazil.

    • Denise B Silva,
    • Lucas M Marques,
    • Daniel P Demarque,
    • Ricardo R Silva,
    • Andrés M C Rodríguez &
    • Norberto P Lopes
  33. Centro de Ciencias Biologicas e da Saude, Universidade Federal de Mato Grosso do Sul, Campo Grande, Brazil.

    • Denise B Silva
  34. UMR CNRS 6553 ECOBIO, University of Rennes 1, Rennes Cedex, France.

    • Enora Briand
  35. Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana, USA.

    • Eve A Granatosky
  36. PBSci-Chemistry & Biochemistry Department, UC Santa Cruz, Santa Cruz, California, USA.

    • Kenji L Kurita &
    • Roger G Linington
  37. Department of Bioengineering, UC San Diego, La Jolla, California, USA.

    • Pep Charusanti &
    • Bernhard Ø Palsson
  38. Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, Oregon, USA.

    • Kerry L McPhail &
    • Oliver B Vining
  39. Department of Plant and Microbial Biology, UC Berkeley, Berkeley, California, USA.

    • Matthew F Traxler
  40. Department of Biological Sciences, Florida International University, Miami, Florida, USA.

    • Niclas Engene
  41. Department of Pharmaceutical Biotechnology, Helmholtz Institute for Pharmaceutical Research Saarland, Saarbrücken, Germany.

    • Thomas Hoffman &
    • Rolf Müller
  42. Center for Oceans and Human Health, Scripps Institute of Oceanography, UC San Diego, La Jolla, California, USA.

    • Vinayak Agarwal &
    • Bradley S Moore
  43. Department of Chemistry, University of Hawaii at Manoa, Honolulu, Hawaii, USA.

    • Philip G Williams,
    • Jingqui Dai,
    • Ram Neupane &
    • Joshua Gurr
  44. Division of Biological Sciences, UC San Diego, La Jolla, California, USA.

    • Anne Lamsa &
    • Kit Pogliano
  45. Department of Nanoengineering, UC San Diego, La Jolla, California, USA.

    • Chen Zhang
  46. School of Pharmaceutical Sciences, University of Geneva, Geneva, Switzerland.

    • Pierre-Marie Allard &
    • Jean-Luc Wolfender
  47. Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany.

    • Prasad Phapale &
    • Theodore Alexandrov
  48. Institut de Chimie des Substances Naturelles, CNRS-ICSN, UPR 2301, Labex CEBA, University of Paris-Saclay, Gif-sur-Yvette, France.

    • Louis-Felix Nothias &
    • Marc Litaudon
  49. Biological Sciences, Pacific Northwest National Laboratory, Richland, Washington, USA.

    • Jennifer E Kyle,
    • Thomas O Metz &
    • Katrina M Waters
  50. National Center for Advancing Translational Sciences, National Institutes of Health, Rockville, Maryland, USA.

    • Tyler Peryea,
    • Dac-Trung Nguyen,
    • Danielle VanLeer,
    • Paul Shinn &
    • Ajit Jadhav
  51. Institute of Microbiology, Chinese Academy of Sciences, Beijing, China.

    • Xueting Liu &
    • Lixin Zhang
  52. Department of Pediatrics, UC San Diego, La Jolla, California, USA.

    • Rob Knight

Contributions

Design and oversight of the project: P.C.D. and N.B. Algorithms: M.W. and N.B. Website: M.W., J.J.C. In-house library acquisition and analysis: V.V.P., L.M.S., N.G., A.J., D.-T.N., D.V., E.E., E.P., H.H., P.S., T.P., V.M. User-curated library acquisition and analysis: A.C.S., A.E., J.M., W.S., W.-T.L., M.J.M., V.V.P., L.M.S., N.G., R.A.Q., A.B., C.P., T.L.-K., A.M.C.R., A.M., M.C., K.R.D., K.K., E.C.O'N., B.S.M., E.B., E.G., D.D.N., S.J.M., P.D.B., X.L., L.Z., H.-U.H., C.F.M., L.J., D.P., S.T., E.A.G., M.S.-C., C.S., K.L.K., P.-M.A., R.G.L., R.S.B., P.R.J., M.F.T., S.J., B.E.S., L.M.M., DP.D., D.B.S., N.P.L., J.P., E.J.N.H., A.K., R.A.K., J.E.K., T.O.M., P.G.W., J.D., R.N., J.G., B.A., O.B.V., K.L.M., E.E.C., A.M.S., A.R.J., R.D.K., J.J.K., K.M.W., C.-C.H., M.M., C.-C.L., Y.-L.Y., A.V.M., C.B.L., D.J.G., F.R., H.M., J.-L.W., J.M., J.A., J.W., J.A.V., K.D., K.F.N., M.L., N.E., N.K., P. Pevzner, P. Phapale, R.J.D., R.B., R.M., R.G.G., T.A., T.H., T.N., V.A., W.H.G., Y.Z. Sample preparation, data generation, and website beta testing: A.E., W.T.L., M.J.M., V.V.P., L.M.S., N.G., R.A.Q., A.B., C.P., T.L.-K., A.M.C.R., A.M., D.J.F., M.C., J.J.C., N.B., P.C.D., E.C.O., E.B., E.G., D.D.N., S.J.M., P.D.B., X.L., L.Z., C.Z., C.F.M., R.R.S., E.A.G., M.S.-C., C.S., D.P., S.T., P.-M.A., R.G.L., B.E.S., L.M.M., J.P., E.J.N.H., D.T.-M., C.A.B.P., M.E., B.T.M., O.B.V., K.L.M., E.E.C., A.M.S., A.R.J., K.R.D. GNPS documentation: M.W., V.V.P., L.M.S., C.A.K., D.D.N., R.R.S., L.A.P. Genome sequencing, assembly and targeted amplification: Y.P., P.C., R.G.G., M.G., B.Ø.P., L.G. Stenothricin GNPS data analysis: W.-T.L., V.V.P., L.M.S., Y.P., P.C.D. NMR acquisition and analysis: B.M.D., P.D.B., L.M.S. Marfey's analysis: Y.P., P.D.B. Microbiology: Y.P., A.C.S., R.S.B. Peptidogenomics analysis: Y.P., R.D.K., P.C.D. Fluorescence Microscopy: Y.P., A.L., K.P. Writing of the paper: M.W., V.V.P., L.M.S., N.G., R.K., P.C.D., and N.B.

Competing financial interests

N.B. has an equity interest in Digital Proteomics, LLC, a company that may potentially benefit from the research results; Digital Proteomics, LLC, was not involved in any aspects of this research. The terms of this arrangement have been reviewed and approved by the University of California, San Diego, in accordance with its conflict-of-interest policies. E.E., E.P., H.H., L.V., and V.M. are employees of Sirenas MD. P.C.D. is on the advisory board for Sirenas MD. T.A. is the Scientific Director of SCiLS GmbH.

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (6.64 MB)

    Supplementary Tables 1–4 and 6–14, Supplementary Figures 1–20, Supplementary Notes 1–12 and Supplementary Methods

Excel files

  1. Supplementary Table 5 (148 KB)

Zip files

  1. Supplementary Source Code (159 MB)

Additional data