A cheminformatics approach to characterize metabolomes in stable-isotope-labeled organisms

Abstract

We report a computational approach (implemented in MS-DIAL 3.0; http://prime.psc.riken.jp/) for metabolite structure characterization using fully 13C-labeled and non-labeled plants and LC–MS/MS. Our approach facilitates carbon number determination and metabolite classification for unknown molecules. Applying our method to 31 tissues from 12 plant species, we assigned 1,092 structures and 344 formulae to 3,604 carbon-determined metabolite ions, 69 of which were found to represent structures currently not listed in metabolome databases.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Cheminformatics strategy for comprehensive metabolite characterization.
Fig. 2: Structures characterized with computational MS and human curation.

Code availability

All program packages and source codes are freely available at http://prime.psc.riken.jp/.

Data availability

All data resources are freely available at http://prime.psc.riken.jp/. The DOI in Metabolomics Workbench is https://doi.org/10.21228/M8XM40.

Change history

  • 16 April 2019

    In the originally published Supplementary Information for this paper, the files presented as Supplementary Tables 3, 4, and 7 were duplicates of Supplementary Tables 5, 6, and 9, respectively. All Supplementary Table files are now correct online.

References

  1. 1.

    Tsugawa, H. Curr. Opin. Biotechnol. 54, 10–17 (2018).

  2. 2.

    Harvey, A. L., Edrada-Ebel, R. & Quinn, R. J. Nat. Rev. Drug. Discov. 14, 111–129 (2015).

  3. 3.

    Banerjee, P. et al. Nucleic Acids Res. 43, D935–D939 (2015).

  4. 4.

    Wang, M. et al. Nat. Biotechnol. 34, 828–837 (2016).

  5. 5.

    Mahieu, N. G. & Patti, G. J. Anal. Chem. 89, 10397–10406 (2017).

  6. 6.

    Giavalisco, P. et al. Plant J. 68, 364–376 (2011).

  7. 7.

    Chokkathukalam, A., Kim, D. H., Barrett, M. P., Breitling, R. & Creek, D. J. Bioanalysis 6, 511–524 (2014).

  8. 8.

    Sumner, L. W. et al. Metabolomics. 3, 211–221 (2007).

  9. 9.

    Tsugawa, H. et al. Anal. Chem. 88, 7946–7958 (2016).

  10. 10.

    Kim, S.-Y. & Volsky, D. J. BMC Bioinformatics 6, 144 (2005).

  11. 11.

    Falkner, J. A., Falkner, J. W., Yocum, A. K. & Andrews, P. C. J. Proteome Res. 7, 4614–4622 (2008).

  12. 12.

    Maciel, E. et al. Chem. Phys. Lipids 174, 1–7 (2013).

  13. 13.

    Nakabayashi, R. et al. Anal. Chem. 85, 1310–1315 (2013).

  14. 14.

    Mütsch-Eckner, M., Erdelmeier, C. A., Sticher, O. & Reuter, H. D. J. Nat. Prod. 56, 864–869 (1993).

  15. 15.

    Kooke, R. et al. Plant Physiol. 170, 2187–2203 (2016).

  16. 16.

    Krasensky, J. & Jonak, C. J. Exp. Bot. 63, 1593–1608 (2012).

  17. 17.

    Tsugawa, H., Ikeda, K. & Arita, M. Biochim. Biophys. Acta Mol. Cell. Biol. Lipids 1862, 762–765 (2017).

  18. 18.

    Warth, B. et al. Anal. Chem. 89, 11505–11513 (2017).

  19. 19.

    Scheubert, K. et al. Nat. Commun. 8, 1494 (2017).

  20. 20.

    Markley, J. L. et al. Curr. Opin. Biotechnol. 43, 34–40 (2017).

  21. 21.

    Saito, K. et al. Plant Cell Rep. 20, 267–271 (2001).

  22. 22.

    Udomsom, N. et al. Front. Plant Sci. 7, 1–14 (2016).

  23. 23.

    Bac-Molenaar, J. A., Vreugdenhil, D., Granier, C. & Keurentjes, J. J. B. J. Exp. Bot. 66, 5567–5580 (2015).

  24. 24.

    Bac-Molenaar, J. A., Granier, C., Keurentjes, J. J. B. & Vreugdenhil, D. Plant Cell Environ. 39, 88–102 (2016).

  25. 25.

    Begley, P. et al. Anal. Chem. 81, 7038–7046 (2009).

  26. 26.

    Huang, Z. et al. J. Med. Chem. 61, 1833–1844 (2018).

  27. 27.

    Han, L. K. & Saito, M. Japan Patent JP5507437B2 (2010).

  28. 28.

    Tsugawa, H. et al. Nat. Methods 12, 523–526 (2015).

  29. 29.

    Domingo-Almenara, X., Montenegro-Burke, J. R., Benton, H. P. & Siuzdak, G. Anal. Chem. 90, 480–489 (2018).

  30. 30.

    Djoumbou Feunang, Y. et al. J. Cheminform. 8, 1–20 (2016).

  31. 31.

    Guijas, C. et al. Anal. Chem. 90, 3156–3164 (2018).

  32. 32.

    Horai, H. et al. J. Mass. Spectrom. 45, 703–714 (2010).

  33. 33.

    Sawada, Y. et al. Phytochemistry 82, 38–45 (2012).

  34. 34.

    Qiu, F., Fine, D. D., Wherritt, D. J., Lei, Z. & Sumner, L. W. Anal. Chem. 88, 11373–11383 (2016).

  35. 35.

    Lai, Z. et al. Nat. Methods 15, 53–56 (2018).

  36. 36.

    Subramanian, A. et al. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).

  37. 37.

    Morreel, K. et al. Plant Cell 26, 929–945 (2014).

Download references

Acknowledgements

This work was partially supported by KAKENHI (JSPS 15K01812 (H.T.), 15H05897 (M.A.), 16H06454 (M.Y.), 17H03621 (M.A.), 18H02432 (H.T.), 18K19155 (H.T.); MHLW 30190401 (M.A.)), MAFF Science and Technology Research Promotion Program for Agriculture, Forestry, Fisheries and Food Industry (R.N.), JST the Strategic International Research Cooperative Program (R.N.), JST National Bioscience Database Center (M.A.), JST the Strategic International Collaborative Research Program (K.S.), Chiba University the GP Program (K.S.) and the Japan Advanced Plant Science Network (K.S.). We thank the HMDB, LipidMAPS and PMN consortiums for providing the SDF files; ChemAxon for a free research license for the Marvin and JChem cheminformatics tools; K. Yonekura-Sakakibara, Y. Higashi, Y. Sawada and K. Mochida for discussions on plant metabolomics; M. Yokota Hirai for providing the place for compound synthesis (RIKEN CSRS); S. Tsugawa for data curation (RIKEN IMS); and U. Petralia for English editing. This work was also supported by the Centre of BioSystems Genomics (no. AA3-WU-PL) (R.K.), the ‘Learning from Nature’ program funded by the Dutch Technology Foundation (STW grant number 10996), which is part of the Netherlands Organization for Scientific Research (J.A.B.-M.), the European Plant Phenotyping Network (EPPN, grant agreement no. 284443) funded by the FP7 Research Infrastructures Program of the European Union (J.A.B.-M.) and the NUE-CROPS project (grant agreement no. 222645) funded by the FP7-CP-IP Research Infrastructures Program of the European Union (N.O.-E.).

Author information

H.T., R.N., M.A. and K.S. designed the research. H.T. developed all computational programs for metabolome annotations. R.N. and T.M. analyzed the biological samples. R.S. performed the synthesis of compounds. A.R., T.N. and M.Y. prepared the O. pumila material. H.T. suggested FSEA, and H.Y. provided fruitful suggestions. Y.Y. created the PlaSMA website. H.T., R.N., T.M., A.R. and M.T. contributed to the curation of structure elucidations. R.K., J.A.B.-M., N.O.-E. and J.J.B.K. shared the seeds of A. thaliana accessions and performed the phenotyping experiments. H.T. wrote the manuscript and H.T., R.N., A.R., J.J.B.K., M.A. and K.S. thoroughly discussed this project and helped to improve the manuscript.

Correspondence to Hiroshi Tsugawa or Kazuki Saito.

Ethics declarations

Competing interests

H.Y., who provided the suggestions for FSEA, is a researcher at Human Metabolome Technologies Inc.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Summary of plant specialized metabolome (PlaSMA) database.

a, The diagram presents coverage in mass spectral databases when CHNOSP elements are considered. b, Principal component analysis (PCA) was performed with PubChem fingerprints for the metabolite structures to clarify the diversity of chemical properties in PlaSMA metabolites. The x- and y-axes show the score values of the first and second principal components. The sizes of spectral records in PlaSMA, MassBank, ReSpect, GNPS, and MetaboBASE are 406, 1,768, 642, 2,501, and 258, respectively, where unique structures were defined by the first layer of InChIKey. c, The retention time and the exact mass of PlaSMA metabolites were mapped. Colors indicate the metabolite class defined by the superclass of ClassyFire.

Supplementary Figure 2 Evaluation of labeling rates in 12 plant species.

a, Labeling rates were examined by a total of 954 metabolite features assigned as levels 2.1, 2.2 and 3.1. The rate was calculated by the equation 13C intensity/(12C + 13C intensities), where the ion abundances were confirmed in plant tissue with maximum intensity among the samples described in Supplementary Table 6. The bar chart describes the average and standard deviation for labeling rates of metabolites in each plant tissue. b, The relationship of m/z and ion abundance with the labeling rate was examined. The x- and y-axes show the m/z value and its labeling rate; the bubble size denotes the scale of ion abundance.

Supplementary Figure 3 Evaluation of MS-FINDER structure elucidations.

MS-FINDER accuracies were evaluated for the formula, the metabolite class, and the structure predictions in publicly available databases containing 4,430 and 1,741 high-quality tandem mass spectral (HQ-MS/MS) records, respectively, in positive and negative ion mode. The search for molecular structures yielded ontology predictions for the second panel; FSEA returned ontology predictions without requiring information from the structural database. The x- and y-axes show the ranking of correct structures and the accuracy of the characterized features.

Supplementary Figure 4 Methodology for the creation of the fragment database for FSEA and molecular fingerprints.

High-quality mass spectral records were extracted on the basis of ion abundance and mass accuracy criteria. Duplicate records for the same metabolite were merged by fragment formula consistency; the ion abundance was simply summed. Substructures were assigned for the product ions and neutral losses; fragment ion structures were managed using the neutralized form with the rearranged hydrogen value. All structures and substructures were managed with the first layer of InChIKey, followed by conversion into the ontology term. Fragments were recorded as ‘frequently observed’ when ion or neutral loss exceeded 0.2% among the merged MS/MS data. Frequently observed fragments were used in the set subjected to FSEA; the query was managed using m/z or the neutral loss formula, and InChIKey and ontology candidates.

Supplementary Figure 5 FSEA concept and methodology.

a, The concepts underlying FSEA and gene set enrichment analysis (GSEA) are compared using phosphatidylcholine (PC). When significant characteristic fragments used to recommend the PC class are detected, the unknown MS/MS spectrum is recommended as a PC class; the estimated P value is derived with the Fisher exact test (one-sided). b, The significance of peak features is currently defined as a more than 5% relative abundance. The peaks are annotated using the fragment ontology database; formula and ontology candidate information is used for querying m/z and neutral losses. A 2 × 2 cross-tabulation table is prepared with the aid of annotated information; insignificant peak features are defined using several methods whose principles involve different specificities and precisions. Finally, using the table, P values are estimated with the Fisher exact test (one-sided).

Supplementary Figure 6 Further explanation of how to annotate C20H33N3O13S as N-fructosyl S-(2-carboxypropyl) glutathione.

a, Neutral losses of 90 and 120 Da are frequently observed in the spectrum of C-linked flavonoids, whereas the loss of 162 Da is observed in the spectrum of O-linked flavonoids. b, Neutral losses of 90, 120, and 162 Da are commonly observed in the mass fragmentation of N-hexosyl compounds; such losses were also found in the MS/MS spectrum of C20H33N3O13S. c, The FSEA algorithm recommended the metabolite classes as ‘peptides’ (P value = 1.86 × 10–5, one-sided) and ‘amino acid derivatives’ (P value = 0.00276, one-sided). d, The spectrum of S-(2-carboxylpropyl)glutathione, which was linked to C20H33N3O13S by the MS/MS similarity in the integrated molecular network, is shown. The formula and InChIKey were assigned to product ions that were also observed in the spectrum of C20H33N3O13S. e, The discovery of N-fructosyl amino acids isolated from the bulbs of the Allium genus has been reported on the literature. On the basis of such findings, the chemical structure of C20H33N3O13S was predicted as N-fructosyl S-(2-carboxypropyl) glutathione.

Supplementary Figure 7 Total synthesis of N-fructosyl S-(2-carboxypropyl) glutathione, and confirmation by LC-MS/MS.

The results were confirmed three times independently.

Supplementary Figure 8 Metabolome analysis of Arabidopsis thaliana accessions using the enriched spectral database.

a, A. thaliana accessions (n = 25) were mapped by the latitude and longitude of their collection site. The color shows the well-defined post-germination FT trait. b, OPLS-DA score plots were prepared to classify early/late FT phenotypes with the shoot and root metabolomes; 131 and 185 metabolites were annotated using the accumulated spectral library we curated. c, The plot presents details of shoot and root metabolomes and their correlation with various traits. Circuit A shows the metabolites were categorized into 11 classes; circuit B is the heat map of the metabolite profiles in 25 accessions. The cycle size in circuit C indicates the fold change between maximum and minimum ion abundances among the accessions. The –log10 value of the Q-value, adjusted with the Benjamini–Hochberg method in the Mann–Whitney U-test (two-sided) for early/late FT traits, is presented as a bar chart in circuit D. The estimated values for other traits, including the phenotypes elicited by nitrogen depletion and drought stress, and the morphological traits of rosette branching and plant height, obtained with the same method, are presented in line charts in circuit E. The associated metabolites, defined by a greater than 0.85 correlation coefficient, are linked in circuit F. Asterisks indicate that the metabolites were annotated as level 2.2 with the assistance of the workflow presented in this study.

Supplementary information

Supplementary Information

Supplementary Figures 1–8, Supplementary Note 1

Reporting Summary

Supplementary Table 1

Summary of our ten-step cheminformatics methodology.

Supplementary Table 2

Details on PlaSMA standard chemicals.

Supplementary Table 3

Details of the PubChem fingerprint used to describe Supplementary Fig. 1b.

Supplementary Table 4

Details for fragment database and FSEA sets.

Supplementary Table 5

Summary of metabolite class information to confirm biological significance.

Supplementary Table 6

Summary of 2,382 ESI(+) and 1,731 ESI(–) metabolite features including m/z, the retention time (min), MS/MS spectrum, and ion abundances among plants.

Supplementary Table 7

Summary of metabolome analysis for Arabidopsis thaliana accessions.

Supplementary Table 8

Detail of extracted SNPs relevant to the flowering time trait.

Supplementary Table 9

The catalog and lot numbers for 31 tissues in 12 plants.

Supplementary Table 10

Detail of extracted SNPs relevant to the methylthiobutyl glucosinolate profile.

Supplementary Table 11

Details for 1,475 molecular fragment fingerprints used in MS-FINDER.

Supplementary Table 12

Detail for the biotransformation dictionary.

Source data

Source Data Fig. 2

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Further reading