Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Fast mass spectrometry search and clustering of untargeted metabolomics data

Abstract

The throughput of mass spectrometers and the amount of publicly available metabolomics data are growing rapidly, but analysis tools such as molecular networking and Mass Spectrometry Search Tool do not scale to searching and clustering billions of mass spectral data in metabolomics repositories. To address this limitation, we designed MASST+ and Networking+, which can process datasets that are up to three orders of magnitude larger than those processed by state-of-the-art tools.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Fast scoring with indexing.
Fig. 2: MASST+, Clustering+ and Networking+ enable lanthipeptide discovery.

Similar content being viewed by others

Data availability

The datasets analyzed are available at gnps.ucsd.edu. Accession codes related to the lanthipeptides part of the study are MSV000090476, MSV000090473, MSV000090472, MSV000090471, MSV000090457, MSV000089818, MSV000089817, MSV000089816, MSV000089815, MSV000089813, MSV000088816, MSV000088801, MSV000088800, MSV000088764 and MSV000088763. For comparing MASST+ and Networking+ against previous state-of-the-art tools, datasets MSV000078787, clustered GNPS, and unclustered GNPS were used. The accession codes for clustered GNPS and unclustered GNPS are available in Supplementary Data 1.

Code availability

MASST+ and Networking+ are available at https://github.com/mohimanilab/MASSTplus. Other custom software used in this work includes Seq2Ripp (https://github.com/mohimanilab/seq2ripp), PepNovo (https://github.com/jmchilton/pepnovo) and Dereplicator (https://ccms-ucsd.github.io/GNPSDocumentation/dereplicator/).

References

  1. Kale, N. S. et al. MetaboLights: an analog-access database repository for metabolomics data. Curr. Protoc. Bioinformatics 53, 14–13 (2016).

    Article  Google Scholar 

  2. Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016).

    Article  CAS  PubMed  Google Scholar 

  3. Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Wang, M. et al. Mass spectrometry searches using MASST. Nat. Biotechnol. 38, 23–26 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Courraud, J., Ernst, M., Svane Laursen, S., Hougaard, D. M. & Cohen, A. S. Studying autism using untargeted metabolomics in newborn screening samples. J. Mol. Neurosci. 71, 1378–1393 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ernst, M. et al. Gestational age-dependent development of the neonatal metabolome. Pediatr. Res. 89, 1396–1404 (2021).

    Article  CAS  PubMed  Google Scholar 

  7. Frank, A. M. et al. Clustering millions of tandem mass spectra. J. Proteome Res. 7, 113–122 (2008).

    Article  CAS  PubMed  Google Scholar 

  8. Jarmusch, A. K. et al. ReDU: a framework to find and reanalyze public mass spectrometry data. Nat. Methods 17, 901–904 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Quinn, R. A. et al. Global chemical effects of the microbiome include new bile-acid conjugations. Nature 579, 123–129 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Petras, D. et al. Non-targeted metabolomics enables the prioritization and tracking of anthropogenic pollutants in coastal seawater. Chemosphere 271 (2020).

  11. Kuo, T.-H., Yang, C.-T., Chang, H.-Y., Hsueh, Y.-P. & Hsu, C.-C. Nematode-trapping fungi produce diverse metabolites during predator–prey interaction. Metabolites 10, 117 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Depke, T., Thöming, J. G., Kordes, A., Häussler, S. & Brönstrup, M. Untargeted LC-MS metabolomics differentiates between virulent and avirulent clinical strains of Pseudomonas aeruginosa. Biomolecules 10, 1041 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Eberhard, F. E., Klimpel, S., Guarneri, A. A. & Tobias, N. J. Metabolites as predictive biomarkers for Trypanosoma cruzi exposure in triatomine bugs. Comput. Struct. Biotechnol. J. 19, 3051–3057 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Lybbert, A. C., Williams, J. L., Raghuvanshi, R., Jones, A. D. & Quinn, R. A. Mining public mass spectrometry data to characterize the diversity and ubiquity of P. aeruginosa specialized metabolites. Metabolites 10, 445 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Mohimani, H. et al. Dereplication of peptidic natural products through database search of mass spectra. Nat. Chem. Biol. 13, 30–37 (2017).

    Article  CAS  PubMed  Google Scholar 

  16. Frank, A. M. et al. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat. Methods 8, 587–591 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Bandeira, N., Tsur, D., Frank, A. & Pevzner, P. A. Protein identification by spectral networks analysis. Proc. Natl Acad. Sci. USA 104, 6140–6145 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Ramos, A. E. F., Evanno, L., Poupon, E., Champy, P. & Beniddir, M. A. Natural products targeting strategies involving molecular networking: different manners, one goal. Nat. Prod. Rep. 36, 960–980 (2019).

    Article  Google Scholar 

  19. Kalinski, J.-C. J. et al. Molecular networking reveals two distinct chemotypes in pyrroloiminoquinone-producing Tsitsikamma favus sponges. Marine Drugs 17, 60 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Raheem, D. J., Tawfike, A. F., Abdelmohsen, U. R., Edrada-Ebel, R. & Fitzsimmons-Thoss, V. Application of metabolomics and molecular networking in investigating the chemical profile and antitrypanosomal activity of British bluebells (Hyacinthoides non-scripta). Sci. Rep. 9, 2547 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Trautman, E. P., Healy, A. R., Shine, E. E., Herzon, S. B. & Crawford, J. M. Domain-targeted metabolomics delineates the heterocycle assembly steps of colibactin biosynthesis. J. Am. Chem. Soc. 139, 4195–4201 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Vizcaino, M. I., Engel, P., Trautman, E. & Crawford, J. M. Comparative metabolomics and structural characterizations illuminate colibactin pathway-dependent small molecules. J. Am. Chem. Soc. 136, 9244–9247 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Nguyen, D. D. et al. Indexing the Pseudomonas specialized metabolome enabled the discovery of poaeamide B and the bananamides. Nat. Microbiol. 2, 16197 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Woo, S., Kang, K. B., Kim, J. & Sung, S. H. Molecular networking reveals the chemical diversity of selaginellin derivatives, natural phosphodiesterase-4 inhibitors from Selaginella tamariscina. J. Nat. Prod. 82, 1820–1830 (2019).

    Article  CAS  PubMed  Google Scholar 

  25. Reginaldo, F. P. S. et al. Molecular networking discloses the chemical diversity of flavonoids and selaginellins in Selaginella convoluta. Planta Med. 87, 113–123 (2021).

    Article  CAS  PubMed  Google Scholar 

  26. Bittremieux, W. et al. Analog access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics. Preprint at bioRxiv https://doi.org/10.1101/2022.05.15.490691 (2022).

  27. Schnell, N. et al. Prepeptide sequence of epidermin, a ribosomally synthesized antibiotic with four sulphide-rings. Nature 333, 276–278 (1988).

    Article  CAS  PubMed  Google Scholar 

  28. Mohr, K. I. et al. Pinensins: the first antifungal lantibiotics. Angew. Chem. Int. Ed. 54, 11254–11258 (2015).

    Article  CAS  Google Scholar 

  29. Férir, G. et al. The lantibiotic peptide labyrinthopeptin A1 demonstrates broad anti-HIV and anti-HSV activity with potential for microbicidal applications. PLoS ONE 8, e64010 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Iorio, M. et al. A glycosylated, labionin-containing lanthipeptide with marked antinociceptive activity. ACS Chem. Biol. 9, 398–404 (2014).

    Article  CAS  PubMed  Google Scholar 

  31. Arnison, P. G. et al. Ribosomally synthesized and post-translationally modified peptide natural products: overview and recommendations for a universal nomenclature. Nat. Prod. Rep. 30, 108–160 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Frank, A. & Pevzner, P. PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal. Chem. 77, 964–973 (2005).

    Article  CAS  PubMed  Google Scholar 

  33. Walker, M. C. et al. Precursor peptide-targeted mining of more than one hundred thousand genomes expands the lanthipeptide natural product family. BMC Genomics 21, 387 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Kodani, S., Lodato, M. A., Durrant, M. C., Picart, F. & Willey, J. M. SapT, a lanthionine-containing peptide involved in aerial hyphae formation in the streptomycetes. Mol. Microbiol. 58, 1368–1380 (2005).

    Article  CAS  PubMed  Google Scholar 

  35. Ueda, K. et al. AmfS, an extracellular peptidic morphogen in Streptomyces griseus. J. Bacteriol. 184, 1488–1492 (2002).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. da Silva, R. R., Dorrestein, P. C. & Quinn, R. A. Illuminating the dark matter in metabolomics. Proc. Natl Acad. Sci. USA 112, 12549–12550 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  37. Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020).

    Article  CAS  PubMed  Google Scholar 

  38. Nothias, L.-F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. van Der Hooft, J. J. et al. Linking genomics and metabolomics to chart specialized metabolic diversity. Chem. Soc. Rev. 49, 3297–3314 (2020).

    Article  PubMed  Google Scholar 

  40. Yang, J. Y. et al. Molecular networking as a dereplication strategy. J. Nat. Prod. 76, 1686–1699 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl Acad. Sci. USA 109, E1743–E1752 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Ludwig, M., Fleischauer, M., Dührkop, K., Hoffmann, M. A. & Böcker, S. De novo molecular formula annotation and structure elucidation using SIRIUS 4. Methods Mol. Biol. 2104, 185–207 (2020).

  43. Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol 39, 462–471 (2021).

    Article  PubMed  Google Scholar 

  44. Mohimani, H., Kim, S. and Pevzner, P. A. A new approach to evaluating statistical significance of spectral identifications. J. Proteome Res. 12, 1560–1568 (2013).

Download references

Acknowledgements

The work of T.M.Y., M.M., Y.L., B.B. and H.M. was supported by National Institutes of Health New Innovator Award DP2GM137413, US Department of Energy award DE-SC0021340, National Science Foundation award DBI-2117640 and National Institute of General Medicine Sciences of the National Institutes of Health award R43GM150301 (B.B. only). The work of P.C.D. and M.W. was supported by R03OD034493, U24DK133658 and R01GM107550 (P.C.D. only).

Author information

Authors and Affiliations

Authors

Contributions

M.M., T.M.Y., Y.L., M.G., L.L. and A.B. implemented the algorithms. M.M., T.M.Y. and Y.L. performed the analysis. M.W. designed and implemented the GNPS web service for MASST+. B.B., P.C.D. and H.M. designed and directed the work. M.M. and H.M. wrote the manuscript, and all authors contributed to its revision.

Corresponding author

Correspondence to Hosein Mohimani.

Ethics declarations

Competing interests

H.M. and B.B. are cofounders of and have equity interests in Chemia.ai, LLC. P.C.D. is an advisor of and holds equity in Cybele, consulted for MSD Animal Health in 2023 and is a cofounder of, holds equity in and is scientific advisor for Ometa Labs, Arome and Enveda with prior approval by the University of California San Diego. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Marnix Medema and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figs. 1–13, Tables 1–8 and Data 1.

Reporting Summary

List of accession codes 1

Accession codes of data used for experiments.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mongia, M., Yasaka, T.M., Liu, Y. et al. Fast mass spectrometry search and clustering of untargeted metabolomics data. Nat Biotechnol (2024). https://doi.org/10.1038/s41587-023-01985-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-023-01985-4

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing