Abstract
A typical output of a metabolomic experiment is a peak table corresponding to the intensity of measured signals. Peak table processing, an essential procedure in metabolomics, is characterized by its study dependency and combinatorial diversity. While various methods and tools have been developed to facilitate metabolomic data processing, it is challenging to determine which processing workflow will give good performance for a specific metabolomic study. NOREVA, an out-of-the-box protocol, was therefore developed to meet this challenge. First, the peak table is subjected to many processing workflows that consist of three to five defined calculations in combinatorially determined sequences. Second, the results of each workflow are judged against objective performance criteria. Third, various benchmarks are analyzed to highlight the uniqueness of this newly developed protocol in (1) evaluating the processing performance based on multiple criteria, (2) optimizing data processing by scanning thousands of workflows, and (3) allowing data processing for time-course and multiclass metabolomics. This protocol is implemented in an R package for convenient accessibility and to protect users’ data privacy. Preliminary experience in R language would facilitate the usage of this protocol, and the execution time may vary from several minutes to a couple of hours depending on the size of the analyzed data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All data used in this publication have been made available on the NOREVA website (https://idrblab.org/noreva/NOREVA_exampledata.zip) or are available from the corresponding author upon request.
Code availability
All code that constitutes the protocol provided in this study is available for use under a GPL v3 license and can be downloaded from GitHub at https://github.com/idrblab/NOREVA. The NOREVA service is freely available for academic use at https://idrblab.org/noreva/.
References
Pareek, V., Tian, H., Winograd, N. & Benkovic, S. J. Metabolomics and mass spectrometry imaging reveal channeled de novo purine synthesis in cells. Science 368, 283–290 (2020).
Delzenne, N. M. & Bindels, L. B. Microbiome metabolomics reveals new drivers of human liver steatosis. Nat. Med. 24, 906–907 (2018).
Paschos, G. K. & FitzGerald, G. A. Circadian clocks and metabolism: implications for microbiome and aging. Trends Genet. 33, 760–769 (2017).
Wishart, D. S. Emerging applications of metabolomics in drug discovery and precision medicine. Nat. Rev. Drug Discov. 15, 473–484 (2016).
Edison, A. S. et al. NMR: unique strengths that enhance modern metabolomics research. Anal. Chem. 93, 478–499 (2021).
Li, P., Gawaz, M., Chatterjee, M. & Lammerhofer, M. Targeted profiling of short-, medium-, and long-chain fatty acyl-coenzyme as in biological samples by phosphate methylation coupled to liquid chromatography-tandem mass spectrometry. Anal. Chem. 93, 4342–4350 (2021).
Mamani-Huanca, M., Gradillas, A., Lopez-Gonzalvez, A. & Barbas, C. Unraveling the cyclization of l-argininosuccinic acid in biological samples: a study via mass spectrometry and NMR spectroscopy. Anal. Chem. 92, 12891–12899 (2020).
Amodei, D. et al. Improving precursor selectivity in data-independent acquisition using overlapping windows. J. Am. Soc. Mass Spectrom. 30, 669–684 (2019).
Hoffmann, N. et al. mzTab-M: a data standard for sharing quantitative results in mass spectrometry metabolomics. Anal. Chem. 91, 3302–3310 (2019).
Bearden, D. W. et al. Metabolomics test materials for quality control: a study of a urine materials suite. Metabolites 9, 270 (2019).
Huan, T. et al. Systems biology guided by XCMS online metabolomics. Nat. Methods 14, 461–462 (2017).
McLean, C. & Kujawinski, E. B. AutoTuner: high fidelity and robust parameter selection for metabolomics data processing. Anal. Chem. 92, 5724–5732 (2020).
Wen, B., Mei, Z., Zeng, C. & Liu, S. metaX: a flexible and comprehensive software for processing metabolomics data. BMC Bioinformatics 18, 183 (2017).
Yang, Q. et al. NOREVA: enhanced normalization and evaluation of time-course and multi-class metabolomic data. Nucleic Acids Res. 48, W436–W448 (2020).
Cambiaghi, A., Ferrario, M. & Masseroli, M. Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration. Brief. Bioinform. 18, 498–510 (2017).
Chong, J. et al. MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis. Nucleic Acids Res. 46, W486–W494 (2018).
Seyednasrollah, F., Rantanen, K., Jaakkola, P. & Elo, L. L. ROTS: reproducible RNA-seq biomarker detector-prognostic markers for clear cell renal cell cancer. Nucleic Acids Res. 44, e1 (2016).
Considine, E. C. & Salek, R. M. A tool to encourage minimum reporting guideline uptake for data analysis in metabolomics. Metabolites 9, 43 (2019).
Martinez-Arranz, I. et al. Enhancing metabolomics research through data mining. J. Proteom. 127, 275–288 (2015).
Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008).
Schober, D. et al. nmrML: a community supported open data standard for the description, storage, and exchange of NMR data. Anal. Chem. 90, 649–656 (2018).
Gowda, H. et al. Interactive XCMS online: simplifying advanced metabolomic data processing and subsequent statistical analyses. Anal. Chem. 86, 6931–6939 (2014).
Jacob, D., Deborde, C., Lefebvre, M., Maucourt, M. & Moing, A. NMRProcFlow: a graphical and interactive tool dedicated to 1D spectra processing for NMR-based metabolomics. Metabolomics 13, 36 (2017).
Giacomoni, F. et al. Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics. Bioinformatics 31, 1493–1495 (2015).
Forsberg, E. M. et al. Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS online. Nat. Protoc. 13, 633–651 (2018).
Xia, J. & Wishart, D. S. Web-based inference of biological patterns, functions and pathways from metabolomic data using MetaboAnalyst. Nat. Protoc. 6, 743–760 (2011).
Ludewig, A. H. et al. An excreted small molecule promotes C. elegans reproductive development and aging. Nat. Chem. Biol. 15, 838–845 (2019).
Bachem, A. et al. Microbiota-derived short-chain fatty acids promote the memory potential of antigen-activated CD8(+) T cells. Immunity 51, 285–297 (2019).
Han, T. L., Yang, Y., Zhang, H. & Law, K. P. Analytical challenges of untargeted GC-MS-based metabolomics and the critical issues in selecting the data processing strategy. F1000Res. 6, 967 (2017).
Andres, D. A. et al. Improved workflow for mass spectrometry-based metabolomics analysis of the heart. J. Biol. Chem. 295, 2676–2686 (2020).
Wanichthanarak, K., Jeamsripong, S., Pornputtapong, N. & Khoomrung, S. Accounting for biological variation with linear mixed-effects modelling improves the quality of clinical metabolomics data. Comput. Struct. Biotechnol. J. 17, 611–618 (2019).
Shen, X. & Zhu, Z. J. MetFlow: an interactive and integrated workflow for metabolomics data cleaning and differential metabolite discovery. Bioinformatics 35, 2870–2872 (2019).
Willforss, J., Chawade, A. & Levander, F. NormalyzerDE: online tool for improved normalization of omics expression data and high-sensitivity differential expression analysis. J. Proteome Res. 18, 732–740 (2019).
Lee, C. K. et al. Tumor metastasis to lymph nodes requires YAP-dependent metabolic adaptation. Science 363, 644–649 (2019).
Tiwari, S. et al. Arginine-deprivation-induced oxidative damage sterilizes Mycobacterium tuberculosis. Proc. Natl Acad. Sci. USA 115, 9779–9784 (2018).
Tang, J. et al. ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief. Bioinform. 21, 621–636 (2020).
Valikangas, T., Suomi, T. & Elo, L. L. A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief. Bioinform. 19, 1–11 (2018).
Li, B. et al. NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res. 45, W162–W170 (2017).
Yang, Q. et al. A novel bioinformatics approach to identify the consistently well-performing normalization strategy for current metabolomic studies. Brief. Bioinform. 21, 2142–2152 (2020).
Li, B. et al. Performance evaluation and online realization of data-driven normalization methods used in LC/MS based untargeted metabolomics analysis. Sci. Rep. 6, 38881 (2016).
Lee, N. Y. et al. Lactobacillus and Pediococcus ameliorate progression of non-alcoholic fatty liver disease through modulation of the gut microbiome. Gut Microbes 11, 882–899 (2020).
Ayoola, M. B. et al. Polyamine synthesis effects capsule expression by reduction of precursors in Streptococcus pneumoniae. Front. Microbiol. 10, 1996 (2019).
Franciosi, E. et al. Microbial community dynamics in phyto-thermotherapy baths viewed through next generation sequencing and metabolomics approach. Sci. Rep. 10, 17931 (2020).
Taverna, F. et al. BIOMEX: an interactive workflow for (single cell) omics data interpretation and visualization. Nucleic Acids Res. 48, W385–W394 (2020).
Liu, R. & Yang, Z. Single cell metabolomics using mass spectrometry: techniques and data analysis. Anal. Chim. Acta 1143, 124–134 (2021).
Whitson, J. A. et al. SS-31 and NMN: two paths to improve metabolism and function in aged hearts. Aging Cell 19, e13213 (2020).
Cui, X. et al. Assessing the effectiveness of direct data merging strategy in long-term and large-scale pharmacometabonomics. Front. Pharmacol. 10, 127 (2019).
Woollam, M. et al. Urinary volatile terpenes analyzed by gas chromatography-mass spectrometry to monitor breast cancer treatment efficacy in mice. J. Proteome Res. 19, 1913–1922 (2020).
Lee, S. M. et al. Metabolomic biomarkers in midtrimester maternal plasma can accurately predict the development of preeclampsia. Sci. Rep. 10, 16142 (2020).
Lee, C. W. et al. Lipidomic profiles disturbed by the internet gaming disorder in young Korean males. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 1114–1115, 119–124 (2019).
Han, W. & Li, L. Evaluating and minimizing batch effects in metabolomics. Mass Spectrom. Rev. https://doi.org/10.1038/1002/mas.21672 (2020).
Zullig, T. & Kofeler, H. C. High resolution mass spectrometry in lipidomics. Mass Spectrom. Rev. 40, 162–176 (2021).
Narduzzi, L. et al. Ammonium fluoride as suitable additive for HILIC-based LC-HRMS metabolomics. Metabolites 9, 292 (2019).
Park, S. J. et al. Exposure of ultrafine particulate matter causes glutathione redox imbalance in the hippocampus: a neurometabolic susceptibility to Alzheimer’s pathology. Sci. Total Environ. 718, 137267 (2020).
Lee, B. M. et al. Discovery study of integrative metabolic profiles of sesame seeds cultivated in different countries. LWT Food Sci. Technol. 129, 109454 (2020).
Gonzalez-Riano, C. et al. Recent developments along the analytical process for metabolomics workflows. Anal. Chem. 92, 203–226 (2020).
Deng, K. et al. WaveICA: a novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis. Anal. Chim. Acta 1061, 60–69 (2019).
De Livera, A. M., Olshansky, G., Simpson, J. A. & Creek, D. J. NormalizeMets: assessing, selecting and implementing statistical methods for normalizing metabolomics data. Metabolomics 14, 54 (2018).
Drotleff, B. & Lammerhofer, M. Guidelines for selection of internal standard-based normalization strategies in untargeted lipidomic profiling by LC-HR-MS/MS. Anal. Chem. 91, 9836–9843 (2019).
Noonan, M. J., Tinnesand, H. V. & Buesching, C. D. Normalizing gas-chromatography-mass spectrometry data: method choice can alter biological inference. Bioessays 40, e1700210 (2018).
Zheng, F. et al. Development of a plasma pseudotargeted metabolomics method based on ultra-high-performance liquid chromatography-mass spectrometry. Nat. Protoc. 15, 2519–2537 (2020).
Sans, M., Feider, C. L. & Eberlin, L. S. Advances in mass spectrometry imaging coupled to ion mobility spectrometry for enhanced imaging of biological tissues. Curr. Opin. Chem. Biol. 42, 138–146 (2018).
Petras, D., Jarmusch, A. K. & Dorrestein, P. C. From single cells to our planet—recent advances in using mass spectrometry for spatially resolved metabolomics. Curr. Opin. Chem. Biol. 36, 24–31 (2017).
Alexandrov, T. Spatial metabolomics and imaging mass spectrometry in the age of artificial intelligence. Annu. Rev. Biomed. Data Sci. 3, 61–87 (2020).
Hao, L. et al. Metandem: an online software tool for mass spectrometry-based isobaric labeling metabolomics. Anal. Chim. Acta 1088, 99–106 (2019).
Verhoeven, A., Giera, M. & Mayboroda, O. A. KIMBLE: a versatile visual NMR metabolomics workbench in KNIME. Anal. Chim. Acta 1044, 66–76 (2018).
Cardoso, S., Afonso, T., Maraschin, M. & Rocha, M. WebSpecmine: a website for metabolomics data analysis and mining. Metabolites 9, 237 (2019).
Liang, D. et al. IP4M: an integrated platform for mass spectrometry-based metabolomics data mining. BMC Bioinformatics 21, 444 (2020).
Franceschi, P. et al. MetaDB a data processing workflow in untargeted MS-based metabolomics experiments. Front. Bioeng. Biotechnol. 2, 72 (2014).
Calderon-Santiago, M., Lopez-Bascon, M. A., Peralbo-Molina, A. & Priego-Capote, F. MetaboQC: a tool for correcting untargeted metabolomics data with mass spectrometry detection using quality controls. Talanta 174, 29–37 (2017).
Brunius, C., Shi, L. & Landberg, R. Large-scale untargeted LC-MS metabolomics data correction using between-batch feature alignment and cluster-based within-batch signal intensity drift correction. Metabolomics 12, 173 (2016).
Wang, S. et al. MetaboGroup S: a group entropy-based web platform for evaluating normalization methods in blood metabolomics data from maintenance hemodialysis patients. Anal. Chem. 90, 11124–11130 (2018).
Hughes, G. et al. MSPrep-summarization, normalization and diagnostics for processing of mass spectrometry-based metabolomic data. Bioinformatics 30, 133–134 (2014).
Wang, S. & Yang, H. pseudoQC: a regression-based simulation software for correction and normalization of complex metabolomics and proteomics datasets. Proteomics 19, e1900264 (2019).
Schiffman, C. et al. Filtering procedures for untargeted LC-MS metabolomics data. BMC Bioinformatics 20, 334 (2019).
Wei, R. et al. Missing value imputation approach for mass spectrometry-based metabolomics data. Sci. Rep. 8, 663 (2018).
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7, 142 (2006).
De Livera, A. M. et al. Statistical methods for handling unwanted variation in metabolomics data. Anal. Chem. 87, 3606–3615 (2015).
Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083 (2011).
Khodadadi, M. & Pourfarzam, M. A review of strategies for untargeted urinary metabolomic analysis using gas chromatography-mass spectrometry. Metabolomics 16, 66 (2020).
Parca, L., Beck, M., Bork, P. & Ori, A. Quantifying compartment-associated variations of protein abundance in proteomics data. Mol. Syst. Biol. 14, e8131 (2018).
Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell Proteom. 13, 2513–2526 (2014).
Dai, W. et al. Characterization of white tea metabolome: comparison against green and black tea by a nontargeted metabolomics approach. Food Res. Int. 96, 40–45 (2017).
Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020).
Navarro, P. et al. A multicenter study benchmarks software tools for label-free proteome quantification. Nat. Biotechnol. 34, 1130–1136 (2016).
Li, S. X. et al. Circadian alteration in neurobiology during 30 days of abstinence in heroin users. Biol. Psychiatry 65, 905–912 (2009).
Dos Santos, R. O. et al. Kynurenine elevation correlates with T regulatory cells increase in acute Plasmodium vivax infection: a pilot study. Parasite Immunol. 42, e12689 (2020).
Hunt, N. H. et al. The kynurenine pathway and parasitic infections that affect CNS function. Neuropharmacology 112, 389–398 (2017).
Wehrens, R., Franceschi, P., Vrhovsek, U. & Mattivi, F. Stability-based biomarker selection. Anal. Chim. Acta 705, 15–23 (2011).
Skarke, C. et al. A pilot characterization of the human chronobiome. Sci. Rep. 7, 17141 (2017).
Meinicke, P. et al. Metabolite-based clustering and visualization of mass spectrometry data using one-dimensional self-organizing maps. Algorithms Mol. Biol. 3, 9 (2008).
Hussein, M. et al. The killing mechanism of teixobactin against methicillin-resistant Staphylococcus aureus: an untargeted metabolomics study. mSystems 5, e00077 (2020).
Sayqal, A. et al. Metabolic analysis of the response of Pseudomonas putida DOT-T1E strains to toluene using Fourier transform infrared spectroscopy and gas chromatography mass spectrometry. Metabolomics 12, 112 (2016).
Gardinassi, L. G. et al. Integrative metabolomics and transcriptomics signatures of clinical tolerance to Plasmodium vivax reveal activation of innate cell immunity and T cell signaling. Redox Biol. 17, 158–170 (2018).
O’Callaghan, S. et al. PyMS: a python toolkit for processing of gas chromatography-mass spectrometry (GC-MS) data. BMC Bioinform. 13, 115 (2012).
Cui, F. et al. Identification of metabolites and transcripts involved in salt stress and recovery in peanut. Front. Plant Sci. 9, 217 (2018).
Weidt, S. et al. A novel targeted/untargeted GC-orbitrap metabolomics methodology applied to Candida albicans and Staphylococcus aureus biofilms. Metabolomics 12, 189 (2016).
Sayers, E. W. et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 49, D10–D17 (2021).
Benito, S. et al. Plasma biomarker discovery for early chronic kidney disease diagnosis based on chemometric approaches using LC-QTOF targeted metabolomics data. J. Pharm. Biomed. Anal. 149, 46–56 (2018).
Acknowledgements
Funded by Natural Science Foundation of Zhejiang Province (LR21H300001); National Natural Science Foundation of China (81872798 and U1909208); Leading Talent of the ‘Ten Thousand Plan’–National High-Level Talents Special Support Plan of China; Fundamental Research Fund for Central Universities (2018QNA7023); ‘Double Top-Class’ University Project (181201*194232101); Key R&D Program of Zhejiang Province (2020C03010). This work was supported by Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare; Alibaba Cloud; Information Technology Center of Zhejiang University.
Author information
Authors and Affiliations
Contributions
F.Z. conceived the idea and designed the study. J.B.F., Y.Z., Y.H.M., Q.X.Y. and J.T. wrote and debugged codes. J.B.F. and Y.Z. performed the benchmark data analyses. J.B.F., Y.Z., Y.X.W., H.N.Z., J.L., J.T., Q.X.Y., H.C.S., W.Q.Q., Z.R.L. and M.Y.Z. contributed to statistics and data visualization. F.Z. wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Peer review information Nature Protocols thanks Jianguo Xia and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Lee, N. Y. et al. Gut Microbes 11, 882–899 (2020): https://www.tandfonline.com/doi/full/10.1080/19490976.2020.1712984
Taverna, F. et al. Nucleic Acids Res. 48, W385–W394 (2020): https://academic.oup.com/nar/article/48/W1/W385/5835814
Whitson, J. A. et al. Aging Cell 19, e13213 (2020): https://onlinelibrary.wiley.com/doi/10.1111/acel.13213
González-Riano, C. et al. Anal. Chem. 92, 203–226 (2020): https://pubs.acs.org/doi/10.1021/acs.analchem.9b04553
Woollam, M. et al. J. Proteome Res. 19, 1913–1922 (2020): https://pubs.acs.org/doi/10.1021/acs.jproteome.9b00722
Supplementary information
Supplementary Information
Supplementary Figs. 1–6, Supplementary Tables 1–8 and Supplementary Methods 1–4.
Rights and permissions
About this article
Cite this article
Fu, J., Zhang, Y., Wang, Y. et al. Optimization of metabolomic data processing using NOREVA. Nat Protoc 17, 129–151 (2022). https://doi.org/10.1038/s41596-021-00636-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41596-021-00636-9
This article is cited by
-
Optimization of injection moulding parameters on wear properties of ultra-high molecular weight polyethylene
Bulletin of Materials Science (2024)
-
Demonstrating the reliability of in vivo metabolomics based chemical grouping: towards best practice
Archives of Toxicology (2024)
-
A new era begins at Nature Protocols
Nature Protocols (2023)
-
A Boosted Communicational Salp Swarm Algorithm: Performance Optimization and Comprehensive Analysis
Journal of Bionic Engineering (2023)
-
Recent advances in proteomics and metabolomics in plants
Molecular Horticulture (2022)
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.