To the Editor — The past decade has seen an explosive growth in metabolomics, with advances in mass spectrometry (MS) and nuclear magnetic resonance (NMR) enabling the detection of hundreds or even thousands of metabolite species in a single experiment. The wide range of available analytical methods coupled with the even wider range of metabolite databases (vendor-supplied proprietary databases, public-domain databases and private in-house databases) has unfortunately led to a pervasive problem wherein the same metabolite species may be reported by many different names. This nomenclature issue represents a significant barrier for comparative analysis of metabolomics data across studies generated by different institutions and/or platforms1. To this end, a repository of over 280,000 named analytes from over 1,400 MS and NMR studies in the National Metabolomics Data Repository (NMDR) on the Metabolomics Workbench2 has been leveraged to generate a highly curated analytical-chemistry-centric database of common names for metabolite structures and isobaric species. This Reference Set of Metabolite Names (RefMet) has been linked to a metabolite classification system, with numerous positive outcomes including data-sharing potential, facilitation of meta-analysis across studies, and integrated statistical analysis.
RefMet is composed of four groups of annotations (Supplementary Table 1):
Annotations with complete structural characterization of regiochemistry, stereochemistry and double bond geometry obtained from targeted assays. These annotations have associated InChIKey, SMILES and molfiles and can be linked to exact structures in a metabolite database. Examples are prostaglandin E2 (PGE2), 12(S)-hydroxyeicosatetraenoic acid (12(S)-HETE) and α-muricholic acid.
Annotations with characterization of the regiochemistry (structural ‘skeleton’ of the molecule) but with some or all chiral centers undefined and/or double bond geometry unknown. Examples are 12-HETE, 3-hydroxytetradecanoic acid, muricholic acid and 6,9-hexadecadienoic acid. These annotations may also have associated InChIKey, SMILES and molfiles and metabolite database entries, albeit with undefined chirality and/or double bond geometry.
Annotations with information on structural features, but where complete regiochemistry is unknown. This is commonly encountered in experiments using tandem MS where molecular product ion and neutral-loss data is diagnostic for head groups, acyl chains and other fragments. Examples are phosphatidylcholine (PC) 16:0_18:1, ceramide (Cer) 18:1;O2/24:1 and hydroxytetradecanoic acid. These annotations are not linked to molecular database entries and do not have InChIKey, SMILES and molfiles because the regiochemistry is unknown.
Annotations with information on metabolite class and sum-composition information. This level of annotation is typically encountered in untargeted MS experiments where the determinants are precursor ion m/z values and retention time information, and where tandem MS is not informative enough to characterize substructures. Examples are PC 34:1, triacylglycerol (TG) 54:3 and Cer 42:2;O2.
The names used in RefMet are generally based on common, officially accepted terms and incorporate notations appropriate for the level of detail of the analytical technique used. The RefMet nomenclature covers this range of notations within the four groups mentioned above. All metabolite species are classified by two approaches (Fig. 1a), containing three levels of hierarchy in each case (superclass, main class, subclass). Lipids are named3,4 and classified5,6 using the approaches developed by the LIPID MAPS consortium. Non-lipids are first classified automatically from their SMILES strings by the ClassyFire method7 and subsequently manually curated (Supplementary Table 2). Additional metadata including monoisotopic mass, formula, PubChem compound identifier and InChIKey (for exact structures) have been added to RefMet. RefMet is envisaged as an actively expanding open-source repository. As new metabolite annotations are encountered in studies submitted to the NMDR or are obtained through collaborative metabolomics projects, these data will be classified, curated and added to the RefMet database. To engage the research community in this process, we provide a web-based request form where users can submit a new metabolite or group of metabolites for inclusion into RefMet (Supplementary Table 3).
RefMet is available at https://www.metabolomicsworkbench.org/databases/refmet/index.php, with browsing and searching capabilities (Fig. 1b). The complete RefMet library is available for download. An online ‘name-to-RefMet’ conversion tool is available to convert user-supplied metabolite names to the RefMet nomenclature and is capable of performing bulk mapping of instrument- or software-specific annotations to the RefMet format. RefMet-related functions may also be accessed through a representational state transfer (REST) service on the Metabolomics Workbench website, which facilitates high-throughput programmatic access via its application programming interface (API) (Supplementary Table 3). Outreach efforts undertaken by the NMDR include a collaboration with LIPID MAPS to provide a RefMet-driven lipid standardization tool (https://lipidmaps.org/resources/tools/index.php?tab=nomenclature) and a collaboration with the LION/web group at the University of Utrecht (http://www.lipidontology.com) to perform enrichment analysis on experimental data across multiple ontological parameters. The Metabolomics Workbench provides a suite of analysis and statistics tools that use RefMet (Supplementary Fig. 1). Other methods deploying RefMet facilitate cross-study comparisons of either individual metabolite species or metabolite classes at different levels of granularity.
In summary, the development of the RefMet database provides a unifying nomenclature for reporting metabolites detected by analytical methods, thus complementing the existing reporting standards for experimental data8. An important consideration is its support for metabolite annotations with undefined regiochemistry (groups 3 and 4), for which an InChIKey, SMILES string or other or structure-specific identifier can’t be generated. A unified metabolomics nomenclature has a large number of positive outcomes, many of which have been used by the Metabolomics Workbench. As of September 2020 the RefMet database contains over 138,000 metabolite species, and this number is expected to continue to increase.
Pham, D. N. K., Roy, M., Kreider-Mueller, A., Golen, J. A. & Manke, D. R. Metabolites 9, 28 (2019).
Sud, M. et al. Nucleic Acids Res. 44, D463–D470 (2016). D1.
Fahy, E., Cotter, D., Sud, M. & Subramaniam, S. Biochim. Biophys. Acta 1811, 637–647 (2011).
Liebisch, G. et al. J. Lipid Res. https://doi.org/10.1194/jlr.S120001025 (2020).
Fahy, E. et al. J. Lipid Res. 46, 839–861 (2005).
Fahy, E. et al. J. Lipid Res. 50, S9–S14 (2009). (Suppl.).
Djoumbou Feunang, Y. et al. J. Cheminform. 8, 61 (2016).
Sumner, L. W. et al. Metabolomics 3, 211–221 (2007).
This work is supported by NIH grants U2C-DK119886 and OT2-OD030544.
The authors declare no competing interests.
Editorial note: This article has been peer reviewed.
About this article
Cite this article
Fahy, E., Subramaniam, S. RefMet: a reference nomenclature for metabolomics. Nat Methods 17, 1173–1174 (2020). https://doi.org/10.1038/s41592-020-01009-y