We report a computational approach (implemented in MS-DIAL 3.0; http://prime.psc.riken.jp/) for metabolite structure characterization using fully 13C-labeled and non-labeled plants and LC–MS/MS. Our approach facilitates carbon number determination and metabolite classification for unknown molecules. Applying our method to 31 tissues from 12 plant species, we assigned 1,092 structures and 344 formulae to 3,604 carbon-determined metabolite ions, 69 of which were found to represent structures currently not listed in metabolome databases.
Access optionsAccess options
Subscribe to Journal
Get full journal access for 1 year
only $20.17 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
All program packages and source codes are freely available at http://prime.psc.riken.jp/.
Tsugawa, H. Curr. Opin. Biotechnol. 54, 10–17 (2018).
Harvey, A. L., Edrada-Ebel, R. & Quinn, R. J. Nat. Rev. Drug. Discov. 14, 111–129 (2015).
Banerjee, P. et al. Nucleic Acids Res. 43, D935–D939 (2015).
Wang, M. et al. Nat. Biotechnol. 34, 828–837 (2016).
Mahieu, N. G. & Patti, G. J. Anal. Chem. 89, 10397–10406 (2017).
Giavalisco, P. et al. Plant J. 68, 364–376 (2011).
Chokkathukalam, A., Kim, D. H., Barrett, M. P., Breitling, R. & Creek, D. J. Bioanalysis 6, 511–524 (2014).
Sumner, L. W. et al. Metabolomics. 3, 211–221 (2007).
Tsugawa, H. et al. Anal. Chem. 88, 7946–7958 (2016).
Kim, S.-Y. & Volsky, D. J. BMC Bioinformatics 6, 144 (2005).
Falkner, J. A., Falkner, J. W., Yocum, A. K. & Andrews, P. C. J. Proteome Res. 7, 4614–4622 (2008).
Maciel, E. et al. Chem. Phys. Lipids 174, 1–7 (2013).
Nakabayashi, R. et al. Anal. Chem. 85, 1310–1315 (2013).
Mütsch-Eckner, M., Erdelmeier, C. A., Sticher, O. & Reuter, H. D. J. Nat. Prod. 56, 864–869 (1993).
Kooke, R. et al. Plant Physiol. 170, 2187–2203 (2016).
Krasensky, J. & Jonak, C. J. Exp. Bot. 63, 1593–1608 (2012).
Tsugawa, H., Ikeda, K. & Arita, M. Biochim. Biophys. Acta Mol. Cell. Biol. Lipids 1862, 762–765 (2017).
Warth, B. et al. Anal. Chem. 89, 11505–11513 (2017).
Scheubert, K. et al. Nat. Commun. 8, 1494 (2017).
Markley, J. L. et al. Curr. Opin. Biotechnol. 43, 34–40 (2017).
Saito, K. et al. Plant Cell Rep. 20, 267–271 (2001).
Udomsom, N. et al. Front. Plant Sci. 7, 1–14 (2016).
Bac-Molenaar, J. A., Vreugdenhil, D., Granier, C. & Keurentjes, J. J. B. J. Exp. Bot. 66, 5567–5580 (2015).
Bac-Molenaar, J. A., Granier, C., Keurentjes, J. J. B. & Vreugdenhil, D. Plant Cell Environ. 39, 88–102 (2016).
Begley, P. et al. Anal. Chem. 81, 7038–7046 (2009).
Huang, Z. et al. J. Med. Chem. 61, 1833–1844 (2018).
Han, L. K. & Saito, M. Japan Patent JP5507437B2 (2010).
Tsugawa, H. et al. Nat. Methods 12, 523–526 (2015).
Domingo-Almenara, X., Montenegro-Burke, J. R., Benton, H. P. & Siuzdak, G. Anal. Chem. 90, 480–489 (2018).
Djoumbou Feunang, Y. et al. J. Cheminform. 8, 1–20 (2016).
Guijas, C. et al. Anal. Chem. 90, 3156–3164 (2018).
Horai, H. et al. J. Mass. Spectrom. 45, 703–714 (2010).
Sawada, Y. et al. Phytochemistry 82, 38–45 (2012).
Qiu, F., Fine, D. D., Wherritt, D. J., Lei, Z. & Sumner, L. W. Anal. Chem. 88, 11373–11383 (2016).
Lai, Z. et al. Nat. Methods 15, 53–56 (2018).
Subramanian, A. et al. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Morreel, K. et al. Plant Cell 26, 929–945 (2014).
This work was partially supported by KAKENHI (JSPS 15K01812 (H.T.), 15H05897 (M.A.), 16H06454 (M.Y.), 17H03621 (M.A.), 18H02432 (H.T.), 18K19155 (H.T.); MHLW 30190401 (M.A.)), MAFF Science and Technology Research Promotion Program for Agriculture, Forestry, Fisheries and Food Industry (R.N.), JST the Strategic International Research Cooperative Program (R.N.), JST National Bioscience Database Center (M.A.), JST the Strategic International Collaborative Research Program (K.S.), Chiba University the GP Program (K.S.) and the Japan Advanced Plant Science Network (K.S.). We thank the HMDB, LipidMAPS and PMN consortiums for providing the SDF files; ChemAxon for a free research license for the Marvin and JChem cheminformatics tools; K. Yonekura-Sakakibara, Y. Higashi, Y. Sawada and K. Mochida for discussions on plant metabolomics; M. Yokota Hirai for providing the place for compound synthesis (RIKEN CSRS); S. Tsugawa for data curation (RIKEN IMS); and U. Petralia for English editing. This work was also supported by the Centre of BioSystems Genomics (no. AA3-WU-PL) (R.K.), the ‘Learning from Nature’ program funded by the Dutch Technology Foundation (STW grant number 10996), which is part of the Netherlands Organization for Scientific Research (J.A.B.-M.), the European Plant Phenotyping Network (EPPN, grant agreement no. 284443) funded by the FP7 Research Infrastructures Program of the European Union (J.A.B.-M.) and the NUE-CROPS project (grant agreement no. 222645) funded by the FP7-CP-IP Research Infrastructures Program of the European Union (N.O.-E.).
H.Y., who provided the suggestions for FSEA, is a researcher at Human Metabolome Technologies Inc.
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Integrated supplementary information
a, The diagram presents coverage in mass spectral databases when CHNOSP elements are considered. b, Principal component analysis (PCA) was performed with PubChem fingerprints for the metabolite structures to clarify the diversity of chemical properties in PlaSMA metabolites. The x- and y-axes show the score values of the first and second principal components. The sizes of spectral records in PlaSMA, MassBank, ReSpect, GNPS, and MetaboBASE are 406, 1,768, 642, 2,501, and 258, respectively, where unique structures were defined by the first layer of InChIKey. c, The retention time and the exact mass of PlaSMA metabolites were mapped. Colors indicate the metabolite class defined by the superclass of ClassyFire.
a, Labeling rates were examined by a total of 954 metabolite features assigned as levels 2.1, 2.2 and 3.1. The rate was calculated by the equation 13C intensity/(12C + 13C intensities), where the ion abundances were confirmed in plant tissue with maximum intensity among the samples described in Supplementary Table 6. The bar chart describes the average and standard deviation for labeling rates of metabolites in each plant tissue. b, The relationship of m/z and ion abundance with the labeling rate was examined. The x- and y-axes show the m/z value and its labeling rate; the bubble size denotes the scale of ion abundance.
MS-FINDER accuracies were evaluated for the formula, the metabolite class, and the structure predictions in publicly available databases containing 4,430 and 1,741 high-quality tandem mass spectral (HQ-MS/MS) records, respectively, in positive and negative ion mode. The search for molecular structures yielded ontology predictions for the second panel; FSEA returned ontology predictions without requiring information from the structural database. The x- and y-axes show the ranking of correct structures and the accuracy of the characterized features.
Supplementary Figure 4 Methodology for the creation of the fragment database for FSEA and molecular fingerprints.
High-quality mass spectral records were extracted on the basis of ion abundance and mass accuracy criteria. Duplicate records for the same metabolite were merged by fragment formula consistency; the ion abundance was simply summed. Substructures were assigned for the product ions and neutral losses; fragment ion structures were managed using the neutralized form with the rearranged hydrogen value. All structures and substructures were managed with the first layer of InChIKey, followed by conversion into the ontology term. Fragments were recorded as ‘frequently observed’ when ion or neutral loss exceeded 0.2% among the merged MS/MS data. Frequently observed fragments were used in the set subjected to FSEA; the query was managed using m/z or the neutral loss formula, and InChIKey and ontology candidates.
a, The concepts underlying FSEA and gene set enrichment analysis (GSEA) are compared using phosphatidylcholine (PC). When significant characteristic fragments used to recommend the PC class are detected, the unknown MS/MS spectrum is recommended as a PC class; the estimated P value is derived with the Fisher exact test (one-sided). b, The significance of peak features is currently defined as a more than 5% relative abundance. The peaks are annotated using the fragment ontology database; formula and ontology candidate information is used for querying m/z and neutral losses. A 2 × 2 cross-tabulation table is prepared with the aid of annotated information; insignificant peak features are defined using several methods whose principles involve different specificities and precisions. Finally, using the table, P values are estimated with the Fisher exact test (one-sided).
Supplementary Figure 6 Further explanation of how to annotate C20H33N3O13S as N-fructosyl S-(2-carboxypropyl) glutathione.
a, Neutral losses of 90 and 120 Da are frequently observed in the spectrum of C-linked flavonoids, whereas the loss of 162 Da is observed in the spectrum of O-linked flavonoids. b, Neutral losses of 90, 120, and 162 Da are commonly observed in the mass fragmentation of N-hexosyl compounds; such losses were also found in the MS/MS spectrum of C20H33N3O13S. c, The FSEA algorithm recommended the metabolite classes as ‘peptides’ (P value = 1.86 × 10–5, one-sided) and ‘amino acid derivatives’ (P value = 0.00276, one-sided). d, The spectrum of S-(2-carboxylpropyl)glutathione, which was linked to C20H33N3O13S by the MS/MS similarity in the integrated molecular network, is shown. The formula and InChIKey were assigned to product ions that were also observed in the spectrum of C20H33N3O13S. e, The discovery of N-fructosyl amino acids isolated from the bulbs of the Allium genus has been reported on the literature. On the basis of such findings, the chemical structure of C20H33N3O13S was predicted as N-fructosyl S-(2-carboxypropyl) glutathione.
Supplementary Figure 7 Total synthesis of N-fructosyl S-(2-carboxypropyl) glutathione, and confirmation by LC-MS/MS.
The results were confirmed three times independently.
Supplementary Figure 8 Metabolome analysis of Arabidopsis thaliana accessions using the enriched spectral database.
a, A. thaliana accessions (n = 25) were mapped by the latitude and longitude of their collection site. The color shows the well-defined post-germination FT trait. b, OPLS-DA score plots were prepared to classify early/late FT phenotypes with the shoot and root metabolomes; 131 and 185 metabolites were annotated using the accumulated spectral library we curated. c, The plot presents details of shoot and root metabolomes and their correlation with various traits. Circuit A shows the metabolites were categorized into 11 classes; circuit B is the heat map of the metabolite profiles in 25 accessions. The cycle size in circuit C indicates the fold change between maximum and minimum ion abundances among the accessions. The –log10 value of the Q-value, adjusted with the Benjamini–Hochberg method in the Mann–Whitney U-test (two-sided) for early/late FT traits, is presented as a bar chart in circuit D. The estimated values for other traits, including the phenotypes elicited by nitrogen depletion and drought stress, and the morphological traits of rosette branching and plant height, obtained with the same method, are presented in line charts in circuit E. The associated metabolites, defined by a greater than 0.85 correlation coefficient, are linked in circuit F. Asterisks indicate that the metabolites were annotated as level 2.2 with the assistance of the workflow presented in this study.
Supplementary Figures 1–8, Supplementary Note 1
Summary of our ten-step cheminformatics methodology.
Details on PlaSMA standard chemicals.
Details of the PubChem fingerprint used to describe Supplementary Fig. 1b.
Details for fragment database and FSEA sets.
Summary of metabolite class information to confirm biological significance.
Summary of 2,382 ESI(+) and 1,731 ESI(–) metabolite features including m/z, the retention time (min), MS/MS spectrum, and ion abundances among plants.
Summary of metabolome analysis for Arabidopsis thaliana accessions.
Detail of extracted SNPs relevant to the flowering time trait.
The catalog and lot numbers for 31 tissues in 12 plants.
Detail of extracted SNPs relevant to the methylthiobutyl glucosinolate profile.
Details for 1,475 molecular fragment fingerprints used in MS-FINDER.
Detail for the biotransformation dictionary.
About this article
Mass Spectrometry Data Repository Enhances Novel Metabolite Discoveries with Advances in Computational Metabolomics
Current Opinion in Systems Biology (2019)
Comparison of the Phytochemical Composition of Serenoa repens Extracts by a Multiplexed Metabolomic Approach