RefMet: a reference nomenclature for metabolomics

To the Editor — The past decade has seen an explosive growth in metabolomics, with advances in mass spectrometry (MS) and nuclear magnetic resonance (NMR) enabling the detection of hundreds or even thousands of metabolite species in a single experiment. The wide range of available analytical methods coupled with the even wider range of metabolite databases (vendor-supplied proprietary databases, public-domain databases and private in-house databases) has unfortunately led to a pervasive problem wherein the same metabolite species may be reported by many different names. This nomenclature issue represents a significant barrier for comparative analysis of metabolomics data across studies generated by different institutions and/or platforms1. To this end, a repository of over 280,000 named analytes from over 1,400 MS and NMR studies in the National Metabolomics Data Repository (NMDR) on the Metabolomics Workbench2 has been leveraged to generate a highly curated analytical-chemistry-centric database of common names for metabolite structures and isobaric species. This Reference Set of Metabolite Names (RefMet) has been linked to a metabolite classification system, with numerous positive outcomes including data-sharing potential, facilitation of meta-analysis across studies, and integrated statistical analysis.

RefMet is composed of four groups of annotations (Supplementary Table 1):

  1. 1.

    Annotations with complete structural characterization of regiochemistry, stereochemistry and double bond geometry obtained from targeted assays. These annotations have associated InChIKey, SMILES and molfiles and can be linked to exact structures in a metabolite database. Examples are prostaglandin E2 (PGE2), 12(S)-hydroxyeicosatetraenoic acid (12(S)-HETE) and α-muricholic acid.

  2. 2.

    Annotations with characterization of the regiochemistry (structural ‘skeleton’ of the molecule) but with some or all chiral centers undefined and/or double bond geometry unknown. Examples are 12-HETE, 3-hydroxytetradecanoic acid, muricholic acid and 6,9-hexadecadienoic acid. These annotations may also have associated InChIKey, SMILES and molfiles and metabolite database entries, albeit with undefined chirality and/or double bond geometry.

  3. 3.

    Annotations with information on structural features, but where complete regiochemistry is unknown. This is commonly encountered in experiments using tandem MS where molecular product ion and neutral-loss data is diagnostic for head groups, acyl chains and other fragments. Examples are phosphatidylcholine (PC) 16:0_18:1, ceramide (Cer) 18:1;O2/24:1 and hydroxytetradecanoic acid. These annotations are not linked to molecular database entries and do not have InChIKey, SMILES and molfiles because the regiochemistry is unknown.

  4. 4.

    Annotations with information on metabolite class and sum-composition information. This level of annotation is typically encountered in untargeted MS experiments where the determinants are precursor ion m/z values and retention time information, and where tandem MS is not informative enough to characterize substructures. Examples are PC 34:1, triacylglycerol (TG) 54:3 and Cer 42:2;O2.

The names used in RefMet are generally based on common, officially accepted terms and incorporate notations appropriate for the level of detail of the analytical technique used. The RefMet nomenclature covers this range of notations within the four groups mentioned above. All metabolite species are classified by two approaches (Fig. 1a), containing three levels of hierarchy in each case (superclass, main class, subclass). Lipids are named3,4 and classified5,6 using the approaches developed by the LIPID MAPS consortium. Non-lipids are first classified automatically from their SMILES strings by the ClassyFire method7 and subsequently manually curated (Supplementary Table 2). Additional metadata including monoisotopic mass, formula, PubChem compound identifier and InChIKey (for exact structures) have been added to RefMet. RefMet is envisaged as an actively expanding open-source repository. As new metabolite annotations are encountered in studies submitted to the NMDR or are obtained through collaborative metabolomics projects, these data will be classified, curated and added to the RefMet database. To engage the research community in this process, we provide a web-based request form where users can submit a new metabolite or group of metabolites for inclusion into RefMet (Supplementary Table 3).

Fig. 1: Overview of the central role of RefMet in the Metabolomics Workbench infrastructure.
figure1

a, Metabolite annotations reported in studies submitted to the NMDR are used as a key data source for development of the RefMet database. Metabolite names in each study in turn are harmonized and converted to their RefMet equivalents. b,c, RefMet names are linked to a database (b) of molecular structures (in the case of entries with defined structures) and to a metabolite classification system (c). d,e, The set of classified RefMet annotations may be used for multiple modes of statistical analysis (d) and summary reports (e). f, Biochemical pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Human Metabolome Database (HMDB) that have been supplemented with RefMet annotations are used on the Metabolomics Workbench (MW) to map NMDR study data using pathway enrichment tools. g, A REST service for RefMet enables data-sharing efforts with external metabolomics-related portals.

RefMet is available at https://www.metabolomicsworkbench.org/databases/refmet/index.php, with browsing and searching capabilities (Fig. 1b). The complete RefMet library is available for download. An online ‘name-to-RefMet’ conversion tool is available to convert user-supplied metabolite names to the RefMet nomenclature and is capable of performing bulk mapping of instrument- or software-specific annotations to the RefMet format. RefMet-related functions may also be accessed through a representational state transfer (REST) service on the Metabolomics Workbench website, which facilitates high-throughput programmatic access via its application programming interface (API) (Supplementary Table 3). Outreach efforts undertaken by the NMDR include a collaboration with LIPID MAPS to provide a RefMet-driven lipid standardization tool (https://lipidmaps.org/resources/tools/index.php?tab=nomenclature) and a collaboration with the LION/web group at the University of Utrecht (http://www.lipidontology.com) to perform enrichment analysis on experimental data across multiple ontological parameters. The Metabolomics Workbench provides a suite of analysis and statistics tools that use RefMet (Supplementary Fig. 1). Other methods deploying RefMet facilitate cross-study comparisons of either individual metabolite species or metabolite classes at different levels of granularity.

In summary, the development of the RefMet database provides a unifying nomenclature for reporting metabolites detected by analytical methods, thus complementing the existing reporting standards for experimental data8. An important consideration is its support for metabolite annotations with undefined regiochemistry (groups 3 and 4), for which an InChIKey, SMILES string or other or structure-specific identifier can’t be generated. A unified metabolomics nomenclature has a large number of positive outcomes, many of which have been used by the Metabolomics Workbench. As of September 2020 the RefMet database contains over 138,000 metabolite species, and this number is expected to continue to increase.

References

  1. 1.

    Pham, D. N. K., Roy, M., Kreider-Mueller, A., Golen, J. A. & Manke, D. R. Metabolites 9, 28 (2019).

    CAS  Article  Google Scholar 

  2. 2.

    Sud, M. et al. Nucleic Acids Res. 44, D463–D470 (2016). D1.

    CAS  Article  Google Scholar 

  3. 3.

    Fahy, E., Cotter, D., Sud, M. & Subramaniam, S. Biochim. Biophys. Acta 1811, 637–647 (2011).

    CAS  Article  Google Scholar 

  4. 4.

    Liebisch, G. et al. J. Lipid Res. https://doi.org/10.1194/jlr.S120001025 (2020).

  5. 5.

    Fahy, E. et al. J. Lipid Res. 46, 839–861 (2005).

    CAS  Article  Google Scholar 

  6. 6.

    Fahy, E. et al. J. Lipid Res. 50, S9–S14 (2009). (Suppl.).

    Article  Google Scholar 

  7. 7.

    Djoumbou Feunang, Y. et al. J. Cheminform. 8, 61 (2016).

    Article  Google Scholar 

  8. 8.

    Sumner, L. W. et al. Metabolomics 3, 211–221 (2007).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This work is supported by NIH grants U2C-DK119886 and OT2-OD030544.

Author information

Affiliations

Authors

Contributions

RefMet was conceived by E.F. along with discussions with S.S. E.F. carried out the development and the curation efforts. The manuscript was drafted by E.F. and revised by S.S. and E.F. S.S. is the principal investigator for the project.

Corresponding author

Correspondence to Eoin Fahy.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Editorial note: This article has been peer reviewed.

Supplementary information

Supplementary Information

Supplementary Fig. 1 and Tables 1–3

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Fahy, E., Subramaniam, S. RefMet: a reference nomenclature for metabolomics. Nat Methods 17, 1173–1174 (2020). https://doi.org/10.1038/s41592-020-01009-y

Download citation

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing