Abstract
The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The datasets analyzed are available at gnps.ucsd.edu. Accession codes related to the lanthipeptides part of the study are MSV000090476, MSV000090473, MSV000090472, MSV000090471, MSV000090457, MSV000089818, MSV000089817, MSV000089816, MSV000089815, MSV000089813, MSV000088816, MSV000088801, MSV000088800, MSV000088764 and MSV000088763. For comparing MASST+ and Networking+ against previous state-of-the-art tools, datasets MSV000078787, clustered GNPS, and unclustered GNPS were used. The accession codes for clustered GNPS and unclustered GNPS are available in Supplementary Data 1.
Code availability
MASST+ and Networking+ are available at https://github.com/mohimanilab/MASSTplus. Other custom software used in this work includes Seq2Ripp (https://github.com/mohimanilab/seq2ripp), PepNovo (https://github.com/jmchilton/pepnovo) and Dereplicator (https://ccms-ucsd.github.io/GNPSDocumentation/dereplicator/).
References
Kale, N. S. et al. MetaboLights: an analog-access database repository for metabolomics data. Curr. Protoc. Bioinformatics 53, 14–13 (2016).
Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016).
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Wang, M. et al. Mass spectrometry searches using MASST. Nat. Biotechnol. 38, 23–26 (2020).
Courraud, J., Ernst, M., Svane Laursen, S., Hougaard, D. M. & Cohen, A. S. Studying autism using untargeted metabolomics in newborn screening samples. J. Mol. Neurosci. 71, 1378–1393 (2021).
Ernst, M. et al. Gestational age-dependent development of the neonatal metabolome. Pediatr. Res. 89, 1396–1404 (2021).
Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008).
Jarmusch, A. K. et al. ReDU: a framework to find and reanalyze public mass spectrometry data. Nat. Methods 17, 901–904 (2020).
Quinn, R. A. et al. Global chemical effects of the microbiome include new bile-acid conjugations. Nature 579, 123–129 (2020).
Petras, D. et al. Non-targeted metabolomics enables the prioritization and tracking of anthropogenic pollutants in coastal seawater. Chemosphere 271 (2020).
Kuo, T.-H., Yang, C.-T., Chang, H.-Y., Hsueh, Y.-P. & Hsu, C.-C. Nematode-trapping fungi produce diverse metabolites during predator–prey interaction. Metabolites 10, 117 (2020).
Depke, T., Thöming, J. G., Kordes, A., Häussler, S. & Brönstrup, M. Untargeted LC-MS metabolomics differentiates between virulent and avirulent clinical strains of Pseudomonas aeruginosa. Biomolecules 10, 1041 (2020).
Eberhard, F. E., Klimpel, S., Guarneri, A. A. & Tobias, N. J. Metabolites as predictive biomarkers for Trypanosoma cruzi exposure in triatomine bugs. Comput. Struct. Biotechnol. J. 19, 3051–3057 (2021).
Lybbert, A. C., Williams, J. L., Raghuvanshi, R., Jones, A. D. & Quinn, R. A. Mining public mass spectrometry data to characterize the diversity and ubiquity of P. aeruginosa specialized metabolites. Metabolites 10, 445 (2020).
Mohimani, H. et al. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 13, 30–37 (2017).
Frank, A. M. et al. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat. Methods 8, 587–591 (2011).
Bandeira, N., Tsur, D., Frank, A. & Pevzner, P. A. Protein identification by spectral networks analysis. Proc. Natl Acad. Sci. USA 104, 6140–6145 (2007).
Ramos, A. E. F., Evanno, L., Poupon, E., Champy, P. & Beniddir, M. A. Natural products targeting strategies involving molecular networking: different manners, one goal. Nat. Prod. Rep. 36, 960–980 (2019).
Kalinski, J.-C. J. et al. Molecular networking reveals two distinct chemotypes in pyrroloiminoquinone-producing Tsitsikamma favus sponges. Marine Drugs 17, 60 (2019).
Raheem, D. J., Tawfike, A. F., Abdelmohsen, U. R., Edrada-Ebel, R. & Fitzsimmons-Thoss, V. Application of metabolomics and molecular networking in investigating the chemical profile and antitrypanosomal activity of British bluebells (Hyacinthoides non-scripta). Sci. Rep. 9, 2547 (2019).
Trautman, E. P., Healy, A. R., Shine, E. E., Herzon, S. B. & Crawford, J. M. Domain-targeted metabolomics delineates the heterocycle assembly steps of colibactin biosynthesis. J. Am. Chem. Soc. 139, 4195–4201 (2017).
Vizcaino, M. I., Engel, P., Trautman, E. & Crawford, J. M. Comparative metabolomics and structural characterizations illuminate colibactin pathway-dependent small molecules. J. Am. Chem. Soc. 136, 9244–9247 (2014).
Nguyen, D. D. et al. Indexing the Pseudomonas specialized metabolome enabled the discovery of poaeamide B and the bananamides. Nat. Microbiol. 2, 16197 (2016).
Woo, S., Kang, K. B., Kim, J. & Sung, S. H. Molecular networking reveals the chemical diversity of selaginellin derivatives, natural phosphodiesterase-4 inhibitors from Selaginella tamariscina. J. Nat. Prod. 82, 1820–1830 (2019).
Reginaldo, F. P. S. et al. Molecular networking discloses the chemical diversity of flavonoids and selaginellins in Selaginella convoluta. Planta Med. 87, 113–123 (2021).
Bittremieux, W. et al. Analog access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics. Preprint at bioRxiv https://doi.org/10.1101/2022.05.15.490691 (2022).
Schnell, N. et al. Prepeptide sequence of epidermin, a ribosomally synthesized antibiotic with four sulphide-rings. Nature 333, 276–278 (1988).
Mohr, K. I. et al. Pinensins: the first antifungal lantibiotics. Angew. Chem. Int. Ed. 54, 11254–11258 (2015).
Férir, G. et al. The lantibiotic peptide labyrinthopeptin A1 demonstrates broad anti-HIV and anti-HSV activity with potential for microbicidal applications. PLoS ONE 8, e64010 (2013).
Iorio, M. et al. A glycosylated, labionin-containing lanthipeptide with marked antinociceptive activity. ACS Chem. Biol. 9, 398–404 (2014).
Arnison, P. G. et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30, 108–160 (2013).
Frank, A. & Pevzner, P. PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005).
Walker, M. C. et al. Precursor peptide-targeted mining of more than one hundred thousand genomes expands the lanthipeptide natural product family. BMC Genomics 21, 387 (2020).
Kodani, S., Lodato, M. A., Durrant, M. C., Picart, F. & Willey, J. M. SapT, a lanthionine-containing peptide involved in aerial hyphae formation in the streptomycetes. Mol. Microbiol. 58, 1368–1380 (2005).
Ueda, K. et al. AmfS, an extracellular peptidic morphogen in Streptomyces griseus. J. Bacteriol. 184, 1488–1492 (2002).
da Silva, R. R., Dorrestein, P. C. & Quinn, R. A. Illuminating the dark matter in metabolomics. Proc. Natl Acad. Sci. USA 112, 12549–12550 (2015).
Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020).
Nothias, L.-F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).
van Der Hooft, J. J. et al. Linking genomics and metabolomics to chart specialized metabolic diversity. Chem. Soc. Rev. 49, 3297–3314 (2020).
Yang, J. Y. et al. Molecular networking as a dereplication strategy. J. Nat. Prod. 76, 1686–1699 (2013).
Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl Acad. Sci. USA 109, E1743–E1752 (2012).
Ludwig, M., Fleischauer, M., Dührkop, K., Hoffmann, M. A. & Böcker, S. De novo molecular formula annotation and structure elucidation using SIRIUS 4. Methods Mol. Biol. 2104, 185–207 (2020).
Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol 39, 462–471 (2021).
Mohimani, H., Kim, S. and Pevzner, P. A. A new approach to evaluating statistical significance of spectral identifications. J. Proteome Res. 12, 1560–1568 (2013).
Acknowledgements
The work of T.M.Y., M.M., Y.L., B.B. and H.M. was supported by National Institutes of Health New Innovator Award DP2GM137413, US Department of Energy award DE-SC0021340, National Science Foundation award DBI-2117640 and National Institute of General Medicine Sciences of the National Institutes of Health award R43GM150301 (B.B. only). The work of P.C.D. and M.W. was supported by R03OD034493, U24DK133658 and R01GM107550 (P.C.D. only).
Author information
Authors and Affiliations
Contributions
M.M., T.M.Y., Y.L., M.G., L.L. and A.B. implemented the algorithms. M.M., T.M.Y. and Y.L. performed the analysis. M.W. designed and implemented the GNPS web service for MASST+. B.B., P.C.D. and H.M. designed and directed the work. M.M. and H.M. wrote the manuscript, and all authors contributed to its revision.
Corresponding author
Ethics declarations
Competing interests
H.M. and B.B. are cofounders of and have equity interests in Chemia.ai, LLC. P.C.D. is an advisor of and holds equity in Cybele, consulted for MSD Animal Health in 2023 and is a cofounder of, holds equity in and is scientific advisor for Ometa Labs, Arome and Enveda with prior approval by the University of California San Diego. The remaining authors declare no competing interests.
Peer review
Peer review information
Nature Biotechnology thanks Marnix Medema and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–13, Tables 1–8 and Data 1.
List of accession codes 1
Accession codes of data used for experiments.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mongia, M., Yasaka, T.M., Liu, Y. et al. Fast mass spectrometry search and clustering of untargeted metabolomics data. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-01985-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41587-023-01985-4