A community resource for paired genomic and metabolomic data mining

Schorn, Michelle A.; Verhoeven, Stefan; Ridder, Lars; Huber, Florian; Acharya, Deepa D.; Aksenov, Alexander A.; Aleti, Gajender; Moghaddam, Jamshid Amiri; Aron, Allegra T.; Aziz, Saefuddin; Bauermeister, Anelize; Bauman, Katherine D.; Baunach, Martin; Beemelmanns, Christine; Beman, J. Michael; Berlanga-Clavero, María Victoria; Blacutt, Alex A.; Bode, Helge B.; Boullie, Anne; Brejnrod, Asker; Bugni, Tim S.; Calteau, Alexandra; Cao, Liu; Carrión, Víctor J.; Castelo-Branco, Raquel; Chanana, Shaurya; Chase, Alexander B.; Chevrette, Marc G.; Costa-Lotufo, Leticia V.; Crawford, Jason M.; Currie, Cameron R.; Cuypers, Bart; Dang, Tam; de Rond, Tristan; Demko, Alyssa M.; Dittmann, Elke; Du, Chao; Drozd, Christopher; Dujardin, Jean-Claude; Dutton, Rachel J.; Edlund, Anna; Fewer, David P.; Garg, Neha; Gauglitz, Julia M.; Gentry, Emily C.; Gerwick, Lena; Glukhov, Evgenia; Gross, Harald; Gugger, Muriel; Guillén Matus, Dulce G.; Helfrich, Eric J. N.; Hempel, Benjamin-Florian; Hur, Jae-Seoun; Iorio, Marianna; Jensen, Paul R.; Kang, Kyo Bin; Kaysser, Leonard; Kelleher, Neil L.; Kim, Chung Sub; Kim, Ki Hyun; Koester, Irina; König, Gabriele M.; Leao, Tiago; Lee, Seoung Rak; Lee, Yi-Yuan; Li, Xuanji; Little, Jessica C.; Maloney, Katherine N.; Männle, Daniel; Martin H., Christian; McAvoy, Andrew C.; Metcalf, Willam W.; Mohimani, Hosein; Molina-Santiago, Carlos; Moore, Bradley S.; Mullowney, Michael W.; Muskat, Mitchell; Nothias, Louis-Félix; O’Neill, Ellis C.; Parkinson, Elizabeth I.; Petras, Daniel; Piel, Jörn; Pierce, Emily C.; Pires, Karine; Reher, Raphael; Romero, Diego; Roper, M. Caroline; Rust, Michael; Saad, Hamada; Saenz, Carmen; Sanchez, Laura M.; Sørensen, Søren Johannes; Sosio, Margherita; Süssmuth, Roderich D.; Sweeney, Douglas; Tahlan, Kapil; Thomson, Regan J.; Tobias, Nicholas J.; Trindade-Silva, Amaro E.; van Wezel, Gilles P.; Wang, Mingxun; Weldon, Kelly C.; Zhang, Fan; Ziemert, Nadine; Duncan, Katherine R.; Crüsemann, Max; Rogers, Simon; Dorrestein, Pieter C.; Medema, Marnix H.; van der Hooft, Justin J. J.

doi:10.1038/s41589-020-00724-z

Download PDF

Comment
Open access
Published: 15 February 2021

A community resource for paired genomic and metabolomic data mining

Nature Chemical Biology volume 17, pages 363–368 (2021)Cite this article

21k Accesses
77 Citations
132 Altmetric
Metrics details

Subjects

Genomics and metabolomics are widely used to explore specialized metabolite diversity. The Paired Omics Data Platform is a community initiative to systematically document links between metabolome and (meta)genome data, aiding identification of natural product biosynthetic origins and metabolite structures.

Interactions between bacteria, fungi, plants, and animals, as well as their environments are often facilitated through specialized metabolites, also known as natural products. These specialized metabolites are molecules naturally produced by organisms that are not strictly required for survival but may confer an advantage to the producing organism, such as the inhibition of nearby species competing for nutritional resources. The chemical structures and functions, as well as the biosynthetic origins of such metabolites, are largely hidden, especially in complex environments. To understand and harness these chemical interactions, it is crucial to study their genetic and structural bases. However, the confident recognition, dereplication, and prioritization of specialized metabolites in complex mixtures remains very challenging. While individual efforts to interpret the chemical and genetic languages have been largely successful in connecting genes and molecules^1,2, large-scale correlations leveraging complementary chemical and genomic data have yet to be realized.

The research community has generated a wealth of genomic and metabolomic data, which has been deposited in dedicated repositories, and tools for mining these data separately are being developed rapidly. Platforms such as the antibiotics and Secondary Metabolite Analysis Shell (antiSMASH)³ and PRediction Informatics for Secondary Metabolomes (PRISM)⁴ use genomic information to annotate biosynthetic gene clusters (BGCs), a set of genes that encode the producing framework for metabolites of diverse chemical classes, such as polyketides, peptides and terpenoids. The antiSMASH database and the Joint Genome Institute’s (JGI’s) Integrated Microbial Genomes and Microbiomes (IMG/M)/Atlas of Biosynthetic Gene Clusters (IMG/ABC) database⁵ contain tens of thousands of BGCs identified in publically available genomes, while the Minimum Information about a Biosynthetic Gene cluster (MIBiG)⁶ database connects over 2,000 BGCs to the specialized metabolites for which they encode the biosynthetic pathways. On the metabolomics side, mass spectrometry (MS) has become the most commonly used technique for performing high-throughput measurements². Data repositories and analysis platforms such as the Global Natural Product Social Molecular Networking-Mass Spectrometry Interactive Virtual Environment (GNPS-MassIVE)⁷, MetaboLights⁸, and the Metabolomics Workbench⁹ facilitate the sharing, processing, and analysis of MS data. These platforms, along with spectral libraries², such as the GNPS spectral library, METLIN, MassBank, and the commercially available NIST library, provide resources for reference mass spectra of a wide range of chemical structures, thereby aiding metabolite annotation. Together, these resources provide the basis for sharing and reusing genomic and metabolomic data and structural annotations and have spurred the development of numerous algorithms for mining these information-dense data.

Several studies and tools have started to explore the combination of genomic and metabolomic data to enhance metabolite annotation, dereplication, and prioritization workflows. While MS-based metabolomics provides increasing amounts of information related to the metabolite structures present in complex mixtures, it faces inherent limitations with respect to structural identification. To address this, several tools, such as GNPS-based molecular networking⁷ and mass spectrometry to latent dirichlet allocation (MS2LDA) substructure discovery¹⁰, have been proposed that computationally exploit tandem mass spectrometry (MS/MS) fragmentation spectra to map relationships between metabolites in networks and identify (shared) substructures, thereby facilitating metabolite annotation. Genomics has also been used to provide complementary structural information through the biotransformations encoded in biosynthetic machinery¹, as well as a way to link specialized metabolites to their producers via BGCs that are mined from genome sequences from known organisms. Integrative strategies have been described for bacterial¹¹, fungal¹², and plant¹³ specialized metabolites. A series of tools and approaches, mostly targeting biosynthetically modular natural products such as peptides and glycosides, have been introduced over the last decade to integrate genome and metabolome data, such as peptidogenomics¹¹, MetaMiner¹⁴, GRAPE-GARLIC¹⁵ and metabologenomics¹⁶. These tools show the potential of combined omics approaches to accelerate natural product discovery.

It has become standard procedure to deposit genomic information to public databases, such as the National Center for Biotechnology Information’s (NCBI’s) GenBank¹⁷ or JGI’s IMG/M⁵, and it is becoming increasingly common to submit mass spectrometry data to repositories such as GNPS-MassIVE⁷, MetaboLights⁸ or Metabolomics workbench⁹. However, there is currently no straightforward way to connect different types of omics data that are derived from the same biological source. It often takes extensive literature review to determine which omics data belong to the same species, organism, or sample, and therefore constitute ‘paired’ datasets, making reuse of these data challenging and time consuming. Additionally, there is no straightforward way to obtain consistent metadata for such links. To facilitate large-scale, effective integration of these data, it is vital to have a community-driven online resource that stores annotated links between paired datasets. Here, we refer to paired data as genomic data (specifically a genome or metagenome assembly) and metabolomic data (specifically MS/MS data) that originate from the same source. So far, no such platform supporting natural product discovery has been available. The value of integrating different data types and organizing sample metadata is increasingly recognized by the scientific community. For example, the BioStudies¹⁸ and BioSample¹⁹ databases facilitate the capture and organization of various omics data types and sample information. In particular, the BioStudies database supports linkage between genomics and metabolomics studies; however, links between genome-mining resources, such as MIBiG, and natural product metabolomics platforms, such as GNPS-MassIVE, are currently not documented in this database.

Here we introduce the Paired Omics Data Platform (PoDP) to streamline access to paired omics data so that both humans and computers can access and read paired datasets and can also record and exploit validated links between BGCs and metabolites (https://pairedomicsdata.bioinformatics.nl/). In addition to linking these omics data types, the platform stores essential metadata (i.e., growth media, extraction solvent, and ionization mode) using existing ontology where available, thus facilitating reuse of for-the-user relevant sections of paired data. This platform will boost the successful integration of unsupervised data-mining strategies to fine-tune the structural annotation of modular natural product classes and include yet-unknown classes of natural products. This will aid in structural and functional annotations of natural products and the genes responsible for their production, and we anticipate that this will help uncover the potential producers of molecules in nature. Finally, registering these links in a standardized way gives the community an invaluable resource of Findable, Accessible, Interoperable, and Reusable (FAIR)²⁰ data.

Standards for paired data

The aim of the PoDP is to connect public metabolomics datasets to their genomic origins. The PoDP does not store any metabolomics or genomics datasets, but captures metadata defining pairs of omics datasets in existing public databases and platforms already validated and utilized by the genomics and metabolomics communities. The PoDP consists of a six-section form for easy and quick input of data (Fig. 1). The metadata is organized in projects that can consist of multiple related experiments, identified by their MassIVE accession or MetaboLights study identifier. The (meta)genomes(s) used in these experiments can all be added to the same project via a public database identifier (e.g., a NCBI GenBank accession number or JGI Genome ID), with the user creating easy-to-recall genome labels for each (meta)genome. Minimal metadata with information about sample preparation and data collection are recorded in a modular way, allowing for multiple experimental set-ups within one project. Furthermore, through BioSample accession IDs, metadata stored elsewhere can be linked to (meta)genome(s) as well. User-specified metadata labels are also used for easy recall in the linking step, in which a URL for a specific set of MS spectra is linked with the genome label and metadata labels to create a genome–metabolome link. To create a BGC–MS/MS link, a MIBiG identifier for the same or similar BGC can be linked with a MS/MS URL and scan number of a single measured molecule or molecular network nodes (representing unique measured molecules) in a molecular family (a group of structurally related molecules identified by similar fragmentation patterns). This approach thus stimulates the submission of validated gene clusters to the MIBiG repository in order to make a BGC–MS/MS link in the PoDP.

**Fig. 1: Overview of the Paired Omics Data Platform.**

By obtaining iterative feedback from a group of early users from various research groups, we narrowed down the required metadata in the PoDP to the minimum information needed to make meaningful links between genomic and metabolomic data relevant to the community. Capturing the full range of relevant variables in any given experiment in a standardized and machine-readable format would lead to a very complex and tedious data entry process. Therefore, a balance was struck between flexible and user-friendly data entry, maintaining machine readability for future large-scale analyses. By standardizing and connecting to ontologies only the most relevant information that could substantially affect the metabolites produced, extracted, and detected by MS, we arrived at a set of minimal metadata required for submission.

To enable machine readability of the data, ontologies are used to standardize response options wherever possible. This ensures that a global community can use the same term for a given piece of metadata and use these ontologies to make accurate and meaningful selections of data to analyze. For example, researchers can reliably select and obtain only datasets that use tryptic soy broth for culture or only metagenomic datasets derived from aquatic invertebrates, or just the fraction of paired datasets in which the MS data was obtained in positive ionization mode. For metadata categories with numerous options, in which all possibilities cannot be captured by standard ontologies, an “Other” category is provided for further explanation. Free text entered in the “Other” boxes is inherently not machine-readable but gives an option for customization by the user and can help to keep important but non-standardized records of the paired data. Furthermore, all fields including the “Other” boxes can be searched to find projects containing specific data.

Preliminary dataset statistics

An initial call to deposit paired datasets in the PoDP was met with enthusiasm from the research community. Over 45 laboratories from 10 countries have contributed 70 paired datasets. Those 70 projects (Box 1) contain 4,853 MS samples associated with sequenced source material. Of the more than 2,600 different genomic sources deposited, 1,306 are metagenomes, 1,268 are genomes, and 42 are metagenome-assembled genomes. The impressive collection of over 4,800 genome–metabolome links is accompanied by metadata: 155 sample preparation methods, 100 extraction methods, and 75 instrumentation methods. Furthermore, 114 links between BGCs and their associated MS/MS spectra are registered in the platform. These community-curated data are regularly archived to a Zenodo dataset and made available for download in JSON format.

The PoDP encourages adherence to FAIR principles²⁰, requiring data to already be deposited in databases and made publicly available before being entered in the PoDP. Presence of a project in the PoDP will increase the findability of those data, results, and publications, while allowing researchers to perform new analyses on existing publicly available data without the need to generate new data. As part of this community effort, a number of projects deposited in the PoDP made their data publicly available to allow submission into the platform; thus far, over 680 metabolomics samples and over 70 genomic sources, including five BGCs newly uploaded to MIBiG, were made public. For example, the PoDP stimulated the upload of metabolomics data to MassIVE for a collection of 120 sequenced Streptomyces strains for which genomics data was previously published²¹. In another example, 20 metagenomes from marine sediments were made public for the platform. Additionally, some datasets were acquired and made publicly available expressly for deposition into the PoDP. In one case, a research group with 44 already sequenced cyanobacterial strains²² was inspired to acquire metabolomics data for each strain so that the paired data could be uploaded to the PoDP.

To better view the data encompassed by the PoDP, users can search for projects under the “List” tab, using keywords to find studies of interest. For example, to find paired data resulting from a Streptomyces or Salinispora species, searching for the genera (“Streptomyces | Salinispora”) will result in the projects (currently 18) that measured Streptomyces or Salinispora strains. Likewise, to compare projects that used methanol to extract cell pellets, searching “methanol + cells” retrieves projects that used methanol to extract cell pellets. To obtain more detail on the metadata contained in each project, users can navigate to the project page by clicking on the project identifier. There, users can find details about the genome or metagenome when clicking on the label, which will then provide a link to the publically available genomic data. Likewise, the publically available MS data can be downloaded directly from the link provided. Clicking on the Sample Growth, Extraction, and Instrumentation Methods labels will display the corresponding metadata.

Applications of the platform

The PoDP can be used in both basic and advanced ways. In a basic way, researchers from across disciplines can apply linked data for numerous applications (Fig. 2). With linked data, we refer to a BGC that can be experimentally linked to a MS/MS spectrum or a molecular family. For example, a natural product chemist who isolates a molecule from a cyanobacterium can use the PoDP to find mass spectra from genetically similar cyanobacteria for comparative metabolomics analyses. A biologist who has identified a BGC of interest and has MS data for the producing strain can download data for the products of similar BGCs and their products to determine whether the BGC is novel and/or to guide molecule isolation. Scientists from all fields can find reliable paired data for use in their own research while also contributing their data for future community use. The importance of consistent metadata cannot be underestimated, and we welcome the development of curated resources such as the Natural Products Atlas²³ that aim to create coherent records for microbial natural products. Combined with the PoDP, this gives researchers complementary resources to mine for natural product structures, their producers, and available omics data.

**Fig. 2: Example use cases of the Paired Omics Data Platform.**

Furthermore, more advanced applications are possible utilizing large-scale computational approaches (Fig. 2). Several algorithmic strategies to link genomics and metabolomics data to chart specialized metabolic diversity have been suggested, including correlation- and feature-based matching². Both types of linking benefit from systematically curated datasets of related organisms with BGCs and metabolites occurring in various samples. With the PoDP in place now, these strategies can be used more effectively to select appropriate datasets to start mining for novel links. Moreover, algorithms to score and rank links between BGCs and metabolites are easier to develop and benchmark: for example, a new set of scores was recently proposed using a number of PoDP datasets with validated BGC–metabolite links to demonstrate the effect of the novel scoring system within the newly introduced NPLinker framework²⁴.

Moving forward with FAIR data

The amount of preliminary data deposited and the enthusiasm from the community for the PoDP reaffirm the need for such a repository of paired public datasets. Feedback from early users also indicated an eagerness to include additional kinds of data in the future. Presently, the PoDP is expressly for linking MS/MS data and whole-genome or metagenome data. Potentially, the PoDP could be developed to include other types of spectral data, like full scan (MS¹) metabolomics mass spectrometry data and NMR, as well as proteomics data. Additionally, different kinds of genomic data could be facilitated, including 16S rRNA or other amplicon sequences, transcriptomic data, and genetic manipulation or heterologous expression data. Such additions will further fuel integrated omics analysis tools and approaches, a field that has gained much traction recently²⁵.

The PoDP requires researchers to deposit their data in public databases, stimulating the upload of data by early users, which is exemplified by more than 1,800 GNPS-MassIVE and MetaboLights submissions just prior to submitting these data in the PoDP. As a FAIR data platform, the PoDP not only facilitates reuse of data, but also promotes the work of researchers who submit their data to the PoDP, through increased publication visibility. Future efforts to (re)use these data by connecting to other platforms and programs for analyzing paired data, such as NPLinker²⁴, will further the field of natural product prediction and discovery.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Each project can be downloaded from the website individually as a JSON file. The (meta)genome and metabolome datasets can be found in their public repositories. All PoDP projects are archived monthly to Zenodo at https://doi.org/10.5281/zenodo.3736430.

Code availability

The software is licensed under the Apache 2.0 open source license and the source code can be found on GitHub (https://github.com/iomega/paired-data-form), which includes the dependencies of the software. Each software release is archived to Zenodo at https://doi.org/10.5281/zenodo.2656630. A full description of how the platform was built can be found on https://pairedomicsdata.bioinformatics.nl/methods.

References

Tietz, J. I. & Mitchell, D. A. Curr. Top. Med. Chem. 16, 1645–1694 (2016).
Article CAS Google Scholar
van der Hooft, J. J. J. et al. Chem. Soc. Rev. 49, 3297–3314 (2020).
Article Google Scholar
Blin, K. et al. Nucleic Acids Res. 47, W81–W87 (2019).
Article CAS Google Scholar
Skinnider, M. A., Merwin, N. J., Johnston, C. W. & Magarvey, N. A. Nucleic Acids Res. 45, W49–W54 (2017).
Article CAS Google Scholar
Palaniappan, K. et al. Nucleic Acids Res. 48, D422–D430 (2020).
CAS PubMed Google Scholar
Kautsar, S. A. et al. Nucleic Acids Res. 48, D454–D458 (2020).
PubMed Google Scholar
Wang, M. et al. Nat. Biotechnol. 34, 828–837 (2016).
Article CAS Google Scholar
Haug, K. et al. Nucleic Acids Res. 48, D440–D444 (2020).
CAS PubMed Google Scholar
Sud, M. et al. Nucleic Acids Res. 44, D463–D470 (2016).
Article CAS Google Scholar
van der Hooft, J. J. J., Wandy, J., Barrett, M. P., Burgess, K. E. V. & Rogers, S. Proc. Natl. Acad. Sci. USA 113, 13738–13743 (2016).
Article Google Scholar
Kersten, R. D. et al. Nat. Chem. Biol. 7, 794–802 (2011).
Article CAS Google Scholar
Hautbergue, T., Jamin, E. L., Debrauwer, L., Puel, O. & Oswald, I. P. Nat. Prod. Rep. 35, 147–173 (2018).
Article CAS Google Scholar
Jeon, J. E. et al. Cell 180, 176–187.e19 (2020).
Article CAS Google Scholar
Cao, L. et al. Cell Syst. 9, 600–608.e4 (2019).
Article CAS Google Scholar
Dejong, C. A. et al. Nat. Chem. Biol. 12, 1007–1014 (2016).
Article CAS Google Scholar
Goering, A. W. et al. ACS Cent. Sci. 2, 99–108 (2016).
Article CAS Google Scholar
Benson, D. A. et al. Nucleic Acids Res. 41, D36–D42 (2013).
Article CAS Google Scholar
Sarkans, U. et al. Nucleic Acids Res. 46, D1266–D1270 (2018).
Article CAS Google Scholar
Barrett, T. et al. Nucleic Acids Res. 40, D57–D63 (2012).
Article CAS Google Scholar
Wilkinson, M. D. et al. Sci. Data 3, 160018 (2016).
Article Google Scholar
Chevrette, M. G. et al. Nat. Commun. 10, 516 (2019).
Article CAS Google Scholar
Shih, P. M. et al. Proc. Natl. Acad. Sci. USA 110, 1053–1058 (2013).
Article CAS Google Scholar
van Santen, J. A. et al. ACS Cent. Sci. 5, 1824–1833 (2019).
Article Google Scholar
Eldjárn, G. H. et al. Ranking microbial metabolomic and genomic links using correlation-based and feature-based scoring functions. Preprint at bioRxiv https://doi.org/10.1101/2020.06.12.148205 (2020).
Misra, B. B., Langefeld, C., Olivier, M. & Cox, L. A. J. Mol. Endocrinol. 62, R21–R45 (2019).
Article CAS Google Scholar

Download references

Acknowledgements

The research reported in this publication was supported by an ASDI eScience Grant (ASDI.2017.030) from the Netherlands eScience Center (to J.J.J.v.d.H. and M.H.M.), a National Institutes of Health (NIH) Genome to Natural Products Network supplementary award (no. U01GM110706 to M.H.M.), a Wageningen Graduate School Postdoc Talent Program fellowship (to M.A.S.), a Marie Sklodowska-Curie Individual Fellowship from the European Union (MSCA-IF-EF-ST-897121 to M.A.S.), the National Science Foundation (NSF) (1817955 to L.M.S. and 1817887 to R.J.D.), a Fundaçao para a Ciencia e Tecnologia (FCT) fellowship (SFRH/BD/136367/2018 to R.C.B.), the National Cancer Institute of the NIH (award no. F32CA221327 to M.W.M.), the University of California, San Diego, Scripps Institution of Oceanography, and two grant from the NIH (Awards GM118815 and 107550 to L.G.), and the National Center for Complementary and Integrative Health of the NIH (award no. R01AT009143 to R.J.T. and N.L.K.).

Author information

These authors contributed equally: Michelle A. Schorn, Stefan Verhoeven.

Authors and Affiliations

Laboratory of Microbiology, Department of Agricultural and Food Sciences, Wageningen University, Wageningen, the Netherlands
Michelle A. Schorn
Bioinformatics Group, Wageningen University, Wageningen, the Netherlands
Michelle A. Schorn, Marnix H. Medema & Justin J. J. van der Hooft
Netherlands eScience Center, Amsterdam, the Netherlands
Stefan Verhoeven, Lars Ridder & Florian Huber
Wisconsin Institute for Discovery and Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI, USA
Deepa D. Acharya, Shaurya Chanana & Marc G. Chevrette
Collaborative Mass Spectrometry Innovation Center, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
Alexander A. Aksenov, Allegra T. Aron, Anelize Bauermeister, Asker Brejnrod, Julia M. Gauglitz, Emily C. Gentry, Tiago Leao, Louis-Félix Nothias, Daniel Petras, Mingxun Wang, Kelly C. Weldon & Pieter C. Dorrestein
Department of Psychiatry, University of California San Diego, San Diego, CA, USA
Gajender Aleti
Leibniz Institute for Natural Product Research and Infection Biology e.V. Hans-Knöll-Institute (HKI), Jena, Germany
Jamshid Amiri Moghaddam & Christine Beemelmanns
Pharmaceutical Biology Department, Pharmaceutical Institute, Eberhard Karls University Tübingen, Tübingen, Germany
Saefuddin Aziz, Harald Gross, Leonard Kaysser, Daniel Männle & Hamada Saad
Microbiology Department, Biology Faculty, Jenderal Soedirman University, Purwokerto, Indonesia
Saefuddin Aziz
Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo, Brazil
Anelize Bauermeister & Leticia V. Costa-Lotufo
Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
Katherine D. Bauman, Alexander B. Chase, Tristan de Rond, Alyssa M. Demko, Lena Gerwick, Evgenia Glukhov, Dulce G. Guillén Matus, Paul R. Jensen, Tiago Leao, Bradley S. Moore, Mitchell Muskat, Raphael Reher, Douglas Sweeney & Pieter C. Dorrestein
University of Potsdam, Institute of Biochemistry and Biology, Potsdam-Golm, Germany
Martin Baunach & Elke Dittmann
Department of Life and Environmental Sciences, University of California Merced, Merced, CA, USA
J. Michael Beman
Sierra Nevada Research Institute, University of California Merced, Merced, CA, USA
J. Michael Beman
Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora”, Universidad de Málaga-Consejo Superior de Investigaciones Científicas, Departamento de Microbiología, Universidad de Málaga, Málaga, Spain
María Victoria Berlanga-Clavero, Carlos Molina-Santiago & Diego Romero
Department of Microbiology and Plant Pathology, University of California Riverside, Riverside, CA, USA
Alex A. Blacutt, Christopher Drozd & M. Caroline Roper
Molecular Biotechnology, Department of Biosciences, Goethe University Frankfurt, Frankfurt am Main, Germany
Helge B. Bode & Eric J. N. Helfrich
Buchmann Institute for Molecular Life Sciences, Goethe University Frankfurt, Frankfurt am Main, Germany
Helge B. Bode
Senckenberg Gesellschaft für Naturforschung, Frankfurt am Main, Germany
Helge B. Bode, Eric J. N. Helfrich & Nicholas J. Tobias
Max-Planck-Institute for Terrestrial Microbiology, Department of Natural Products in Organismic Interactions, Marburg, Germany
Helge B. Bode
Institut Pasteur, Collection of Cyanobacteria, Paris, France
Anne Boullie & Muriel Gugger
Pharmaceutical Sciences Division, University of Wisconsin-Madison, Madison, WI, USA
Tim S. Bugni & Fan Zhang
Laboratoire d’Analyses Bioinformatiques pour la Génomique et le Métabolisme, Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, Evry, France
Alexandra Calteau
Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
Liu Cao, Yi-Yuan Lee & Hosein Mohimani
Microbial Biotechnology, Institute of Biology, Leiden University, Leiden, the Netherlands
Víctor J. Carrión, Chao Du & Gilles P. van Wezel
Department of Microbial Ecology, Netherlands Institute of Ecology, Wageningen, the Netherlands
Víctor J. Carrión
Interdisciplinary Centre of Marine and Environmental Research), University of Porto, Porto, Portugal
Raquel Castelo-Branco
Faculty of Sciences, University of Porto, Porto, Portugal
Raquel Castelo-Branco
Department of Microbiology, University of Helsinki, Helsinki, Finland
Raquel Castelo-Branco & David P. Fewer
Department of Chemistry, Yale University, New Haven, CT, USA
Jason M. Crawford & Chung Sub Kim
Chemical Biology Institute, Yale University, West Haven, CT, USA
Jason M. Crawford & Chung Sub Kim
Department of Microbial Pathogenesis, Yale School of Medicine, New Haven, CT, USA
Jason M. Crawford
Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, USA
Cameron R. Currie
Department of Energy Great Lakes Bioenergy Research Center, Wisconsin Energy Institute, University of Wisconsin-Madison, Madison, WI, USA
Cameron R. Currie
Adrem Data Lab, Department of Computer Science, University of Antwerp, Antwerp, Belgium
Bart Cuypers
Molecular Parasitology Unit, Department of Biomedical Sciences, Institute of Tropical Medicine, Antwerp, Belgium
Bart Cuypers & Jean-Claude Dujardin
Technische Universität Berlin, Institut für Chemie, Berlin, Germany
Tam Dang, Benjamin-Florian Hempel & Roderich D. Süssmuth
Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
Rachel J. Dutton & Emily C. Pierce
Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
Rachel J. Dutton, Bradley S. Moore, Kelly C. Weldon & Pieter C. Dorrestein
J. Craig Venter Institute, Genomic Medicine Group, La Jolla, CA, USA
Anna Edlund
Department of Pediatrics, School of Medicine, University of California San Diego, La Jolla, CA, USA
Anna Edlund
School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA, USA
Neha Garg & Andrew C. McAvoy
Institute of Microbiology, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
Eric J. N. Helfrich, Jörn Piel & Michael Rust
Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Harvard University, Boston, MA, USA
Eric J. N. Helfrich
Charité, University Medicine Berlin, Berlin-Brandenburg Center for Regenerative Therapy (BCRT), Campus Virchow Klinikum, Berlin, Germany
Benjamin-Florian Hempel
Korean Lichen Research Institute, Sunchon National University, Sunchon, Republic of Korea
Jae-Seoun Hur
Naicons Srl, Milano, Italy
Marianna Iorio & Margherita Sosio
College of Pharmacy, Sookmyung Women’s University, Seoul, Korea
Kyo Bin Kang
German Centre for Infection Research (DZIF), Tübingen, Germany
Leonard Kaysser, Daniel Männle & Nadine Ziemert
Department of Chemistry, Northwestern University, Evanston, IL, USA
Neil L. Kelleher, Michael W. Mullowney & Regan J. Thomson
School of Pharmacy, Sungkyunkwan University, Suwon, Republic of Korea
Chung Sub Kim, Ki Hyun Kim & Seoung Rak Lee
Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
Irina Koester & Daniel Petras
Institute for Pharmaceutical Biology, University of Bonn, Bonn, Germany
Gabriele M. König & Max Crüsemann
Department of Chemistry, Princeton University, Princeton, NJ, USA
Seoung Rak Lee
Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
Xuanji Li & Søren Johannes Sørensen
Department of Pharmaceutical Sciences, University of Illinois at Chicago, Chicago, IL, USA
Jessica C. Little & Laura M. Sanchez
Department of Chemistry, Point Loma Nazarene University, San Diego, CA, USA
Katherine N. Maloney
Interfaculty Institute for Microbiology and Infection Medicine Tübingen, Microbiology and Biotechnology, University of Tübingen, Tübingen, Germany
Daniel Männle & Nadine Ziemert
Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología, Panama, Republic of Panama
Christian Martin H.
Carl R. Woese Institute for Genomic Biology and Department of Microbiology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
Willam W. Metcalf
Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA, USA
Bradley S. Moore
School of Chemistry, University of Nottingham, Nottingham, UK
Ellis C. O’Neill
Department of Chemistry and Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN, USA
Elizabeth I. Parkinson
Instituto Federal de Santa Catarina, Florianópolis, Santa Catarina, Brazil
Karine Pires
Phytochemistry and Plant Systematics Department, Division of Pharmaceutical Industries, National Research Centre, Cairo, Egypt
Hamada Saad
The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Carmen Saenz
Department of Biology, Memorial University of Newfoundland, St. John’s, Canada
Kapil Tahlan
LOEWE-Centre for Translational Biodiversity Genomics, Frankfurt am Main, Germany
Nicholas J. Tobias
Departamento de Fisiologia e Farmacologia, Faculdade de Medicina, Universidade Federal do Ceará, Fortaleza, Ceará, Brazil
Amaro E. Trindade-Silva
University of Strathclyde, Strathclyde Institute of Pharmacy and Biomedical Sciences, Glasgow, UK
Katherine R. Duncan
School of Computing Science, University of Glasgow, Glasgow, UK
Simon Rogers
Department of Pharmacology and Pediatrics, University of California San Diego, La Jolla, CA, USA
Pieter C. Dorrestein

Authors

Michelle A. Schorn
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Verhoeven
View author publications
You can also search for this author in PubMed Google Scholar
Lars Ridder
View author publications
You can also search for this author in PubMed Google Scholar
Florian Huber
View author publications
You can also search for this author in PubMed Google Scholar
Deepa D. Acharya
View author publications
You can also search for this author in PubMed Google Scholar
Alexander A. Aksenov
View author publications
You can also search for this author in PubMed Google Scholar
Gajender Aleti
View author publications
You can also search for this author in PubMed Google Scholar
Jamshid Amiri Moghaddam
View author publications
You can also search for this author in PubMed Google Scholar
Allegra T. Aron
View author publications
You can also search for this author in PubMed Google Scholar
Saefuddin Aziz
View author publications
You can also search for this author in PubMed Google Scholar
Anelize Bauermeister
View author publications
You can also search for this author in PubMed Google Scholar
Katherine D. Bauman
View author publications
You can also search for this author in PubMed Google Scholar
Martin Baunach
View author publications
You can also search for this author in PubMed Google Scholar
Christine Beemelmanns
View author publications
You can also search for this author in PubMed Google Scholar
J. Michael Beman
View author publications
You can also search for this author in PubMed Google Scholar
María Victoria Berlanga-Clavero
View author publications
You can also search for this author in PubMed Google Scholar
Alex A. Blacutt
View author publications
You can also search for this author in PubMed Google Scholar
Helge B. Bode
View author publications
You can also search for this author in PubMed Google Scholar
Anne Boullie
View author publications
You can also search for this author in PubMed Google Scholar
Asker Brejnrod
View author publications
You can also search for this author in PubMed Google Scholar
Tim S. Bugni
View author publications
You can also search for this author in PubMed Google Scholar
Alexandra Calteau
View author publications
You can also search for this author in PubMed Google Scholar
Liu Cao
View author publications
You can also search for this author in PubMed Google Scholar
Víctor J. Carrión
View author publications
You can also search for this author in PubMed Google Scholar
Raquel Castelo-Branco
View author publications
You can also search for this author in PubMed Google Scholar
Shaurya Chanana
View author publications
You can also search for this author in PubMed Google Scholar
Alexander B. Chase
View author publications
You can also search for this author in PubMed Google Scholar
Marc G. Chevrette
View author publications
You can also search for this author in PubMed Google Scholar
Leticia V. Costa-Lotufo
View author publications
You can also search for this author in PubMed Google Scholar
Jason M. Crawford
View author publications
You can also search for this author in PubMed Google Scholar
Cameron R. Currie
View author publications
You can also search for this author in PubMed Google Scholar
Bart Cuypers
View author publications
You can also search for this author in PubMed Google Scholar
Tam Dang
View author publications
You can also search for this author in PubMed Google Scholar
Tristan de Rond
View author publications
You can also search for this author in PubMed Google Scholar
Alyssa M. Demko
View author publications
You can also search for this author in PubMed Google Scholar
Elke Dittmann
View author publications
You can also search for this author in PubMed Google Scholar
Chao Du
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Drozd
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Claude Dujardin
View author publications
You can also search for this author in PubMed Google Scholar
Rachel J. Dutton
View author publications
You can also search for this author in PubMed Google Scholar
Anna Edlund
View author publications
You can also search for this author in PubMed Google Scholar
David P. Fewer
View author publications
You can also search for this author in PubMed Google Scholar
Neha Garg
View author publications
You can also search for this author in PubMed Google Scholar
Julia M. Gauglitz
View author publications
You can also search for this author in PubMed Google Scholar
Emily C. Gentry
View author publications
You can also search for this author in PubMed Google Scholar
Lena Gerwick
View author publications
You can also search for this author in PubMed Google Scholar
Evgenia Glukhov
View author publications
You can also search for this author in PubMed Google Scholar
Harald Gross
View author publications
You can also search for this author in PubMed Google Scholar
Muriel Gugger
View author publications
You can also search for this author in PubMed Google Scholar
Dulce G. Guillén Matus
View author publications
You can also search for this author in PubMed Google Scholar
Eric J. N. Helfrich
View author publications
You can also search for this author in PubMed Google Scholar
Benjamin-Florian Hempel
View author publications
You can also search for this author in PubMed Google Scholar
Jae-Seoun Hur
View author publications
You can also search for this author in PubMed Google Scholar
Marianna Iorio
View author publications
You can also search for this author in PubMed Google Scholar
Paul R. Jensen
View author publications
You can also search for this author in PubMed Google Scholar
Kyo Bin Kang
View author publications
You can also search for this author in PubMed Google Scholar
Leonard Kaysser
View author publications
You can also search for this author in PubMed Google Scholar
Neil L. Kelleher
View author publications
You can also search for this author in PubMed Google Scholar
Chung Sub Kim
View author publications
You can also search for this author in PubMed Google Scholar
Ki Hyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Irina Koester
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele M. König
View author publications
You can also search for this author in PubMed Google Scholar
Tiago Leao
View author publications
You can also search for this author in PubMed Google Scholar
Seoung Rak Lee
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Yuan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Xuanji Li
View author publications
You can also search for this author in PubMed Google Scholar
Jessica C. Little
View author publications
You can also search for this author in PubMed Google Scholar
Katherine N. Maloney
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Männle
View author publications
You can also search for this author in PubMed Google Scholar
Christian Martin H.
View author publications
You can also search for this author in PubMed Google Scholar
Andrew C. McAvoy
View author publications
You can also search for this author in PubMed Google Scholar
Willam W. Metcalf
View author publications
You can also search for this author in PubMed Google Scholar
Hosein Mohimani
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Molina-Santiago
View author publications
You can also search for this author in PubMed Google Scholar
Bradley S. Moore
View author publications
You can also search for this author in PubMed Google Scholar
Michael W. Mullowney
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell Muskat
View author publications
You can also search for this author in PubMed Google Scholar
Louis-Félix Nothias
View author publications
You can also search for this author in PubMed Google Scholar
Ellis C. O’Neill
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth I. Parkinson
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Petras
View author publications
You can also search for this author in PubMed Google Scholar
Jörn Piel
View author publications
You can also search for this author in PubMed Google Scholar
Emily C. Pierce
View author publications
You can also search for this author in PubMed Google Scholar
Karine Pires
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Reher
View author publications
You can also search for this author in PubMed Google Scholar
Diego Romero
View author publications
You can also search for this author in PubMed Google Scholar
M. Caroline Roper
View author publications
You can also search for this author in PubMed Google Scholar
Michael Rust
View author publications
You can also search for this author in PubMed Google Scholar
Hamada Saad
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Saenz
View author publications
You can also search for this author in PubMed Google Scholar
Laura M. Sanchez
View author publications
You can also search for this author in PubMed Google Scholar
Søren Johannes Sørensen
View author publications
You can also search for this author in PubMed Google Scholar
Margherita Sosio
View author publications
You can also search for this author in PubMed Google Scholar
Roderich D. Süssmuth
View author publications
You can also search for this author in PubMed Google Scholar
Douglas Sweeney
View author publications
You can also search for this author in PubMed Google Scholar
Kapil Tahlan
View author publications
You can also search for this author in PubMed Google Scholar
Regan J. Thomson
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas J. Tobias
View author publications
You can also search for this author in PubMed Google Scholar
Amaro E. Trindade-Silva
View author publications
You can also search for this author in PubMed Google Scholar
Gilles P. van Wezel
View author publications
You can also search for this author in PubMed Google Scholar
Mingxun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Kelly C. Weldon
View author publications
You can also search for this author in PubMed Google Scholar
Fan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Nadine Ziemert
View author publications
You can also search for this author in PubMed Google Scholar
Katherine R. Duncan
View author publications
You can also search for this author in PubMed Google Scholar
Max Crüsemann
View author publications
You can also search for this author in PubMed Google Scholar
Simon Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Pieter C. Dorrestein
View author publications
You can also search for this author in PubMed Google Scholar
Marnix H. Medema
View author publications
You can also search for this author in PubMed Google Scholar
Justin J. J. van der Hooft
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.J.J.v.d.H., M.H.M., and P.C.D. conceived the concept and managed the project. S.V., M.A.S., and J.J.J.v.d.H. wrote code and developed the platform. M.H.M., L.R., F.H., and J.J.J.v.d.H. supervised the platform building. All other authors contributed data to the platform, tested it and provided suggestions on how to improve the platform. M.A.S, S.V., P.C.D., M.H.M., and J.J.J.v.d.H. wrote the manuscript, and all authors contributed to editing the manuscript.

Corresponding authors

Correspondence to Pieter C. Dorrestein, Marnix H. Medema or Justin J. J. van der Hooft.

Ethics declarations

Competing interests

M.H.M. is a co-founder of Design Pharmaceuticals and a member of the scientific advisory board of Hexagon Bio. P.C.D. is a member of the scientific advisory boards of Sirenas and Cybele. N.L.K., W.W.M., and R.J.T. are on the board of directors of MicroMGx. M.W. is a founder of Ometa Labs LLC. A.A.A. is a consultant for Ometa Labs, Clarity Genomics and co-founder of Arome Sciences Inc. William Gerwick, spouse of L.G., has an equity interest in Sirenas Marine Discovery, Inc. and NMRFinder, companies that may potentially benefit from the research results, and also serves on the companies’ respective scientific advisory boards. The terms of this last arrangement have been reviewed and approved by the University of California, San Diego (USA), in accordance with its conflict of interest policies.

Supplementary information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schorn, M.A., Verhoeven, S., Ridder, L. et al. A community resource for paired genomic and metabolomic data mining. Nat Chem Biol 17, 363–368 (2021). https://doi.org/10.1038/s41589-020-00724-z

Download citation

Published: 15 February 2021
Issue Date: April 2021
DOI: https://doi.org/10.1038/s41589-020-00724-z

This article is cited by

Enhanced correlation-based linking of biosynthetic gene clusters to their metabolic products through chemical class matching
- Joris J. R. Louwen
- Marnix H. Medema
- Justin J. J. van der Hooft
Microbiome (2023)
Correlative metabologenomics of 110 fungi reveals metabolite–gene cluster pairs
- Lindsay K. Caesar
- Fatma A. Butun
- Neil L. Kelleher
Nature Chemical Biology (2023)
Biogeographic patterns of biosynthetic potential and specialized metabolites in marine sediments
- Alexander B Chase
- Alexander Bogdanov
- Paul R Jensen
The ISME Journal (2023)
Small molecule metabolites: discovery of biomarkers and therapeutic targets
- Shi Qiu
- Ying Cai
- Aihua Zhang
Signal Transduction and Targeted Therapy (2023)
HypoRiPPAtlas as an Atlas of hypothetical natural products for mass spectrometry database search
- Yi-Yuan Lee
- Mustafa Guler
- Hosein Mohimani
Nature Communications (2023)