Abstract
Liquid chromatography–high-resolution mass spectrometry (LC-MS)-based metabolomics aims to identify and quantify all metabolites, but most LC-MS peaks remain unidentified. Here we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. The approach aims to generate, for all experimentally observed ion peaks, annotations that match the measured masses, retention times and (when available) tandem mass spectrometry fragmentation patterns. Peaks are connected based on mass differences reflecting adduction, fragmentation, isotopes, or feasible biochemical transformations. Global optimization generates a single network linking most observed ion peaks, enhances peak assignment accuracy, and produces chemically informative peak–peak relationships, including for peaks lacking tandem mass spectrometry spectra. Applying this approach to yeast and mouse data, we identified five previously unrecognized metabolites (thiamine derivatives and N-glucosyl-taurine). Isotope tracer studies indicate active flux through these metabolites. Thus, NetID applies existing metabolomic knowledge and global optimization to substantially improve annotation coverage and accuracy in untargeted metabolomics datasets, facilitating metabolite discovery.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All LC-MS data, including the yeast and mouse metabolomics datasets, the 13C labeling datasets, and more than 2,000 targeted MS2 files collected from the liver data in mzXML format were deposited in MassIVE (ID no. MSV000087434). R code for generating NetID statistics and for performing false discovery rate analysis in Fig. 2 and Extended Data Fig. 1 is provided in GitHub (https://github.com/LiChenPU/NetID/releases/tag/v1.0) and Zenodo (https://zenodo.org/record/5508337). The atom difference rule table is provided in Supplementary Data 1, the peak table for the yeast negative-mode data, as well as the NetID annotation results, putative metabolite list, and manual curation results are provided in Supplementary Data 2, an in-house retention time list for known metabolites is provided in Supplementary Data 3, the HMDB, YMDB, PubChemLite and PubChemLite_bio reference compound databases (customized to contain relevant information) are provided in Supplementary Data 4–7, and MS2 spectra of newly discovered metabolites are provided in Supplementary Data 8.
Code availability
NetID was developed mainly in R, and used a mixture of IBM ILOG CPLEX Optimization Studio, Matlab and Python. NetID code and example files are available for non-commercial use in GitHub at https://github.com/LiChenPU/NetID/releases/tag/v1.0 and Zenodo at https://zenodo.org/record/5508337, under the GNU General Public License v3.0. User guide and pseudocode are provided in Supplementary Notes 3 and 4.
References
DiNardo, C. D. et al. Durable remissions with ivosidenib in IDH1-mutated relapsed or refractory AML. N. Engl. J. Med. 378, 2386–2398 (2018).
Dang, L. et al. Cancer-associated IDH1 mutations produce 2-hydroxyglutarate. Nature 462, 739–744 (2009).
Doroghazi, J. R. et al. A roadmap for natural product discovery based on large-scale genomics and metabolomics. Nat. Chem. Biol. 10, 963–968 (2014).
Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020).
Johnson, C. H., Ivanisevic, J. & Siuzdak, G. Metabolomics: beyond biomarkers and towards mechanisms. Nat. Rev. Mol. Cell Biol. 17, 451–459 (2016).
Kim, S. et al. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47, D1102–D1109 (2019).
Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018).
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Pence, H. E. & Williams, A. ChemSpider: an online chemical information resource. J. Chem. Educ. 87, 1123–1124 (2010).
Xue, J., Guijas, C., Benton, H. P., Warth, B. & Siuzdak, G. METLIN MS2 molecular standards database: a broad chemical and biological resource. Nat. Methods 17, 953–954 (2020).
Wang, M. et al. Sharing and community curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34, 828–837 (2016).
Tsugawa, H. et al. Hydrogen rearrangement rules: computational MS/MS fragmentation and structure elucidation using MS-FINDER software. Anal. Chem. 88, 7946–7958 (2016).
Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703–714 (2010).
MassBank Europe High Quality Mass Spectral DataBase (MassBank); https://massbank.eu/MassBank/
NIST Standard Reference Database 1A (NIST, 2014); https://www.nist.gov/srd/nist-standard-reference-database-1a
Tautenhahn, R., Patti, G. J., Rinehart, D. & Siuzdak, G. XCMS Online: a web-based platform to process untargeted metabolomic data. Anal. Chem. 84, 5035–5039 (2012).
Forsberg, E. M. et al. Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online. Nat. Protoc. 13, 633–651 (2018).
Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).
Tsugawa, H. et al. A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms. Nat. Methods 16, 295–298 (2019).
Stricker, T., Bonner, R., Lisacek, F. & Hopfgartner, G. Adduct annotation in liquid chromatography/high-resolution mass spectrometry to enhance compound identification. Anal. Bioanal. Chem. 413, 503–517 (2021).
Kuhl, C., Tautenhahn, R., Böttcher, C., Larson, T. R. & Neumann, S. CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84, 283–289 (2012).
Domingo-Almenara, X. et al. Autonomous METLIN-guided in-source fragment annotation for untargeted metabolomics. Anal. Chem. 91, 3246–3253 (2019).
Broeckling, C. D., Afsar, F. A., Neumann, S., Ben-Hur, A. & Prenni, J. E. RAMClust: a novel feature clustering method enables spectral-matching-based annotation for metabolomics data. Anal. Chem. 86, 6812–6817 (2014).
Domingo-Almenara, X., Montenegro-Burke, J. R., Benton, H. P. & Siuzdak, G. Annotation: a computational solution for streamlining metabolomics analysis. Anal. Chem. 90, 480–489 (2018).
Sindelar, M. & Patti, G. J. Chemical discovery in the era of metabolomics. J. Am. Chem. Soc. 142, 9097–9105 (2020).
Wang, L. et al. Peak annotation and verification engine for untargeted LC–MS metabolomics. Anal. Chem. 91, 1838–1846 (2019).
Mahieu, N. G., Huang, X., Chen, Y.-J. & Patti, G. J. Credentialing features: a platform to benchmark and optimize untargeted metabolomic methods. Anal. Chem. 86, 9583–9589 (2014).
Schmid, R. et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 12, 3832 (2021).
Nothias, L.-F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).
da Silva, R. R. et al. Propagating annotations of molecular networks using in silico fragmentation. PLoS Comput. Biol. 14, e1006089 (2018).
Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. 39, 462–471 (2021).
Shen, X. et al. Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics. Nat. Commun. 10, 1516 (2019).
Senan, O. et al. CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network. Bioinformatics 35, 4089–4097 (2019).
Alden, N. et al. Biologically consistent annotation of metabolomics data. Anal. Chem. 89, 13097–13104 (2017).
Del Carratore, F. et al. Integrated probabilistic annotation: a Bayesian-based annotation method for metabolomic profiles integrating biochemical connections, isotope patterns, and adduct relationships. Anal. Chem. 91, 12799–12807 (2019).
Yu, M. & Petrick, L. Untargeted high-resolution paired mass distance data mining for retrieving general chemical relationships. Commun. Chem. 3, 157 (2020).
& Ernst, M. et al. MolNetEnhancer: enhanced molecular networks by integrating metabolome mining and annotation tools. Metabolites 9, 144 (2019).
Watrous, J. et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl Acad. Sci. USA 109, E1743–E1752 (2012).
van der Hooft, J. J. J., Wandy, J., Barrett, M. P., Burgess, K. E. V. & Rogers, S. Topic modeling for untargeted substructure exploration in metabolomics. Proc. Natl Acad. Sci. USA 113, 13738–13743 (2016).
Rogers, S., Scheltema, R. A., Girolami, M. & Breitling, R. Probabilistic assignment of formulas to mass peaks in metabolomics experiments. Bioinformatics 25, 512–518 (2009).
Daly, R. et al. MetAssign: probabilistic annotation of metabolites from LC-MS data using a Bayesian clustering approach. Bioinformatics 30, 2764–2771 (2014).
Ludwig, M. et al. Database-independent molecular formula annotation using Gibbs sampling through ZODIAC. Nat. Mach. Intell. 2, 629–641 (2020).
Kingsford, C. L., Chazelle, B. & Singh, M. Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 21, 1028–1039 (2005).
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B. & Singh, M. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21 (Suppl. 1), i302–i310 (2005).
Ochoa, A. & Singh, M. Domain prediction with probabilistic directional context. Bioinformatics 33, 2471–2478 (2017).
Gusfield, D. Integer Linear Programming in Computational and Systems Biology: An Entry-Level Text and Course (Cambridge University Press, 2019).
Palmer, A. et al. FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry. Nat. Methods 14, 57–60 (2017).
Kind, T. & Fiehn, O. Seven golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 8, 105 (2007).
Melamud, E., Vastag, L. & Rabinowitz, J. D. Metabolomic analysis and visualization engine for LC–MS data. Anal. Chem. 82, 9818–9826 (2010).
Käll, L., Storey, J. D., MacCoss, M. J. & Noble, W. S. Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J. Proteome Res. 7, 29–34 (2008).
Jewison, T. et al. YMDB: the yeast metabolome database. Nucleic Acids Res. 40, D815–D820 (2012).
Bolton, E. & Schymanski, E. PubChemLite tier0 and tier1. Zenodo https://doi.org/10.5281/zenodo.3611238 (2020).
Wang, M. et al. Mass spectrometry searches using MASST. Nat. Biotechnol. 38, 23–26 (2020).
Bonini, P., Kind, T., Tsugawa, H., Barupal, D. K. & Fiehn, O. Retip: retention time prediction for compound annotation in untargeted metabolomics. Anal. Chem. 92, 7515–7522 (2020).
Bach, E., Szedmak, S., Brouard, C., Böcker, S. & Rousu, J. Liquid-chromatography retention order prediction for metabolite identification. Bioinformatics 34, i875–i883 (2018).
Fiehn, O. et al. The metabolomics standards initiative (MSI). Metabolomics 3, 175–178 (2007).
Blaženović, I. et al. Structure annotation of all mass spectra in untargeted metabolomics. Anal. Chem. 91, 2155–2162 (2019).
Lu, W. et al. Improved annotation of untargeted metabolomics data through buffer modifications that shift adduct mass and intensity. Anal. Chem. 92, 11573–11581 (2020).
Xue, J. et al. Enhanced in-source fragmentation annotation enables novel data independent acquisition and autonomous METLIN molecular identification. Anal. Chem. 92, 6051–6059 (2020).
Su, X. et al. In-source CID ramping and covariant ion analysis of hydrophilic interaction chromatography metabolomics. Anal. Chem. 92, 4829–4837 (2020).
Xu, Y.-F. et al. Discovery and functional characterization of a yeast sugar alcohol phosphatase. ACS Chem. Biol. 13, 3011–3020 (2018).
Hui, S. et al. Glucose feeds the TCA cycle via circulating lactate. Nature 551, 115–118 (2017).
Chambers, M. C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).
Xing, S. et al. Recognizing contamination fragment ions in liquid chromatography–tandem mass spectrometry data. J. Am. Soc. Mass Spectrom. 32, 2296–2305 (2021).
Mitchell, J. M. et al. New methods to identify high peak density artifacts in Fourier transform mass spectra and to mitigate their effects on high-throughput metabolomic data analysis. Metabolomics 14, 125 (2018).
Acknowledgements
This work was supported by a Department of Energy (DOE) grant (no. DE-SC0012461 to J.D.R.), the Center for Advanced Bioenergy and Bioproducts Innovation (grant no. DE-SC0018420, subcontract to J.D.R.), NIH grant R50CA211437 to W.L. and the Howard Hughes Medical Institute and Burroughs Wellcome Fund via the PDEP and Hanna H. Gray Fellows Programs to M.R.M. The authors thank I. Pelczer at the NMR facility of the Department of Chemistry at Princeton University for the NMR analysis, the Metabolomics and Lipidomics Mass Spectrometry Core Facility of IMIB at Fudan University for additional mass spectrometry support, and X. Su and Y. An for scientific discussion and help. The Center for Advanced Bioenergy and Bioproducts Innovation and the Center for Bioenergy Innovation are both US Department of Energy Bioenergy Research Centers supported by the Office of Biological and Environmental Research in the DOE Office of Science. Any opinions, findings, conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the US Department of Energy.
Author information
Authors and Affiliations
Contributions
L.C., M.S. and J.D.R. conceived the project. L.C., X.X. and Z.C. wrote the NetID algorithm code. W.L., L.W., X.Z., A.C. and M.R.M. performed the mice experiments. L.W., W.L. and L.C. performed the experiments on yeast. L.C., W.L., L.W. and X.X. analyzed the LC-MS and LC-MS/MS data. X.T., A.D.M. and Y.S. contributed to coding development. B.J.K., A.M.L. and S.R.C. synthesized taurine-related compounds. L.C. and J.D.R. wrote the paper. All of the authors discussed the results and commented on the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Methods thanks Pieter Dorrestein and Justin van der Hooft for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Characterization of NetID network.
(a) Summary table of the candidate annotation step in NetID workflow. (b) Visualization of the optimal network obtained from negative mode LC-MS analysis of baker’s yeast, containing 4851 nodes and 9699 connections. Metabolite and putative metabolite peaks are in green and artifact peaks in purple. (c) Connectivity of NetID network from the yeast negative-mode dataset.
Extended Data Fig. 2 Examples of putative metabolites in yeast negative-mode dataset.
(a-c) Subnetwork surrounding glutathione (a), glycerophosphocholine (b), and xanthurenic acid (c). (d) Peak properties and annotations for putative metabolites (yellow nodes) in subnetworks (a)-(c).
Extended Data Fig. 3 Evaluation of annotation false discovery rate (FDR) and fraction gold-standard peaks annotated correctly using different reference databases.
The four tested reference compound databases are HMDB (human metabolomics database), PBCM (short for PubChemLite.0.2.0, zenodo.org/record/3611238), PBCM_BIO (short for PubChemLite_BioPathway, a subset of biopathway related entries in PubChemLite.0.2.0) and YMDB (yeast metabolomics database). (a) False discovery rate estimated using target-decoy strategy. (b) Fraction of 314 manually curated ‘ground truth’ annotations made correctly. For A and B, each individual data point (circle) is from a different randomized decoy library. N = 10 randomized libraries were tested for each reference compound database. Boxes show median and IQR and whiskers extend to largest and smallest value no further than ±1.5 × IQR from hinge.
Extended Data Fig. 4 Subnetwork surrounding thiamine with additional known structures.
Nodes, connections, and formulae are direct output of NetID. Boxes with structures were manually added.
Extended Data Fig. 5 Evidence for the additional thiamine-derived metabolites.
Similar to Fig. 3, adding unlabeled thiamine to [U-13C]glucose culture media, yeast uptake the unlabeled thiamine, resulting in unlabeled thiamine, M + 4 labeled thiamine + [C4H6O3] and thiamine + [C4H8O] species (n = 5). The proposed formulae are also supported by m/z measured by high-resolution mass-spectrometry. Bar represents mean values and error bar indicates s.d.
Extended Data Fig. 6 Subnetwork surrounding taurine with additional known structures.
Nodes, connections, and formulae are direct output of NetID. Boxes with structures were manually added.
Extended Data Fig. 7 SelTOCSY NMR confirmation of the structure of the chemically synthesized N-glucosyl-taurine.
The final crude material is a mixture of glucose, taurine, and N-glucosyl-taurine at 5.2% (pink line). Comparing N-glucosyl-taurine (yellow) to alpha- (blue) and beta-glucose (green) NMR experiments indicate that C1 of the glucosyl group connects the amine group of taurine in α-position.
Extended Data Fig. 8 Glucosyl-taurine is a liver metabolite, not ex vivo reaction product.
To test for ex vivo production of glucosyl-taurine, liver extract (with or without spiked 55 μM [U-13C]glucose) or extraction buffer (40:40:20 ACN:MeOH:H2O + NH4HCO3 or 50:50 MeOH:H2O) containing pure glucose and taurine were incubated at 5 °C for the indicated duration. Metabolites formed by ex vivo reactions typically accumulate upon sample incubation, while glucosyl-taurine does not. Moreover, there is minimal assimilation of [U-13C]glucose into glucosyl-taurine to make M + 6 glucosyl-taurine in liver extract, and, while trace glucosyl-taurine can be formed abiotically in acetonitrile:methanol:water at pH = 7, the observed biological quantity is 100-fold greater.
Supplementary information
Supplementary Information
Supplementary Tables 1–5, Supplementary Notes 1–4.
Supplementary Data 1
Atom difference rule table
Supplementary Data 2
NetID annotation for the yeast negative-mode dataset
Supplementary Data 3
In-house retention time list
Supplementary Data 4
HMDB reference compound database
Supplementary Data 5
YMDB reference compound database
Supplementary Data 6
PubChemLite reference compound database
Supplementary Data 7
PubChemLite_bio reference compound database
Supplementary Data 8
MS2 spectra of newly discovered metabolites
Rights and permissions
About this article
Cite this article
Chen, L., Lu, W., Wang, L. et al. Metabolite discovery through global annotation of untargeted metabolomics data. Nat Methods 18, 1377–1385 (2021). https://doi.org/10.1038/s41592-021-01303-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01303-3
This article is cited by
-
An integrated metabo-lipidomics profile of induced sputum for the identification of novel biomarkers in the differential diagnosis of asthma and COPD
Journal of Translational Medicine (2024)
-
The changing metabolic landscape of bile acids – keys to metabolism and immune regulation
Nature Reviews Gastroenterology & Hepatology (2024)
-
A diverse proteome is present and enzymatically active in metabolite extracts
Nature Communications (2024)
-
Global 13C tracing and metabolic flux analysis of intact human liver tissue ex vivo
Nature Metabolism (2024)
-
An assessment of AcquireX and Compound Discoverer software 3.3 for non-targeted metabolomics
Scientific Reports (2024)