Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution

Abstract

Chimeric MS/MS spectra contain fragments from multiple precursor ions and therefore hinder compound identification in metabolomics. Historically, deconvolution of these chimeric spectra has been challenging and relied on specific experimental methods that introduce variation in the ratios of precursor ions between multiple tandem mass spectrometry (MS/MS) scans. DecoID provides a complementary, method-independent approach where database spectra are computationally mixed to match an experimentally acquired spectrum by using LASSO regression. We validated that DecoID increases the number of identified metabolites in MS/MS datasets from both data-independent and data-dependent acquisition without increasing the false discovery rate. We applied DecoID to publicly available data from the MetaboLights repository and to data from human plasma, where DecoID increased the number of identified metabolites from data-dependent acquisition data by over 30% compared to direct spectral matching. DecoID is compatible with any user-defined MS/MS database and provides automated searching for some of the largest MS/MS databases currently available.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Deconvolution with DecoID to identify metabolites with chimeric MS/MS spectra.
Fig. 2: Orphan isotopologue contamination and MS/MS spectrum prediction.
Fig. 3: DecoID improves metabolite identification in the DDA and DIA IROA datasets.
Fig. 4: DecoID improves identification rates in NIST SRM 1950.
Fig. 5: DecoID increases the identification rate in a human plasma DIA dataset.
Fig. 6: DecoID increases the identification rate of metabolites in a publicly available mouse xenograft RPLC/MS/MS dataset.

Similar content being viewed by others

Data availability

All MS/MS data used in the evaluation of DecoID have been uploaded to the MetaboLights repository as study MTBLS2207 and is also available on the DecoID GitHub release (https://github.com/e-stan/DecoID/releases/). The publicly available dataset analyzed is available on MetaboLights as study MTBLS1066 (all reversed-phase, negative-mode data files were used). The MS/MS databases applied can be obtained at the curators’ websites (https://mona.fiehnlab.ucdavis.edu/, https://www.mzcloud.org/ and https://hmdb.ca/). The in-house IROA metabolite database is available within the DecoID release on GitHub (https://github.com/e-stan/DecoID/releases/), and the reference spectra have been uploaded to MoNA (submitter: E.S.; origin file: IROA_DB_for_mona_filtered_exported_addedInfo.msp).

Code availability

Source code is available on Zenodo40 and GitHub (https://github.com/e-stan/DecoID/). Included is an example dataset along with documentation for both the DecoID Python package and user interface. A standalone executable built for Windows can alternatively be downloaded from the Patti Lab website (http://pattilab.wustl.edu/software/DecoID/).

References

  1. Blaženović, I., Kind, T., Ji, J. & Fiehn, O. Software tools and approaches for compound identification of LC–MS/MS data in metabolomics. Metabolites 8, 31 (2018).

    Article  Google Scholar 

  2. Baker, E. S. & Patti, G. J. Perspectives on data analysis in metabolomics: points of agreement and disagreement from the 2018 ASMS fall workshop. J. Am. Soc. Mass Spectrom. https://doi.org/10.1007/s13361-019-02295-3 (2019).

  3. Nikolskiy, I., Mahieu, N. G., Chen, Y.-J., Tautenhahn, R. & Patti, G. J. An untargeted metabolomic workflow to improve structural characterization of metabolites. Anal. Chem. 85, 7713–7719 (2013).

    Article  CAS  Google Scholar 

  4. Nash, W. J. & Dunn, W. B. From mass to metabolite in human untargeted metabolomics: recent advances in annotation of metabolites applying liquid chromatography–mass spectrometry data. Trends Analyt. Chem. 120, 115324 (2019).

    Article  Google Scholar 

  5. Tsugawa, H. et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526 (2015).

    Article  CAS  Google Scholar 

  6. Samanipour, S., Reid, M. J., Bæk, K. & Thomas, K. V. Combining a deconvolution and a universal library search algorithm for the nontarget analysis of data-independent acquisition mode liquid chromatography−high-resolution mass spectrometry results. Environ. Sci. Technol. 52, 4694–4701 (2018).

    Article  CAS  Google Scholar 

  7. Li, H., Cai, Y., Guo, Y., Chen, F. & Zhu, Z.-J. MetDIA: targeted metabolite extraction of multiplexed MS/MS spectra generated by data-independent acquisition. Anal. Chem. 88, 8757–8764 (2016).

    Article  CAS  Google Scholar 

  8. Yin, Y., Wang, R., Cai, Y., Wang, Z. & Zhu, Z.-J. DecoMetDIA: deconvolution of multiplexed MS/MS spectra for metabolite identification in SWATH-MS-based untargeted metabolomics. Anal. Chem. 91, 11897–11904 (2019).

    Article  CAS  Google Scholar 

  9. Ting, Y. S. et al. PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat. Methods 14, 903–908 (2017).

    Article  CAS  Google Scholar 

  10. Tsou, C.-C. et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 (2015).

    Article  CAS  Google Scholar 

  11. Wang, J. et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat. Methods 12, 1106–1108 (2015).

    Article  CAS  Google Scholar 

  12. Zhang, B., Pirmoradian, M., Chernobrovkin, A. & Zubarev, R. A. DeMix workflow for efficient identification of cofragmented peptides in high-resolution data-dependent tandem mass spectrometry. Mol. Cell. Proteomics 13, 3211–3223 (2014).

    Article  CAS  Google Scholar 

  13. Dorfer, V., Maltsev, S., Winkler, S. & Mechtler, K. CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17, 2581–2589 (2018).

    Article  Google Scholar 

  14. Houel, S. et al. Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. J. Proteome Res. 9, 4152–4160 (2010).

    Article  CAS  Google Scholar 

  15. Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020).

    CAS  PubMed  Google Scholar 

  16. Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016).

    Article  CAS  Google Scholar 

  17. Kind, T. et al. Identification of small molecules using accurate mass MS/MS search. Mass Spectrom. Rev. 37, 513–532 (2017).

  18. Zhu, X., Chen, Y. & Subramanian, R. Comparison of information-dependent acquisition, SWATH and MSAll techniques in metabolite identification study employing ultrahigh-performance liquid chromatography–quadrupole time-of-flight mass spectrometry. Anal. Chem. 86, 1202–1209 (2014).

    Article  CAS  Google Scholar 

  19. Lawson, T. N. et al. msPurity: automated evaluation of precursor ion purity for mass spectrometry-based fragmentation in metabolomics. Anal. Chem. 89, 2432–2439 (2017).

    Article  CAS  Google Scholar 

  20. Peckner, R. et al. Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics. Nat. Methods 15, 371–378 (2018).

    Article  CAS  Google Scholar 

  21. Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open-source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008).

    Article  CAS  Google Scholar 

  22. Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018).

    Article  CAS  Google Scholar 

  23. Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703–714 (2010).

    Article  CAS  Google Scholar 

  24. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996).

    Google Scholar 

  25. Vinaixa, M. et al. Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects. Trends Analyt. Chem. 78, 23–35 (2016).

    Article  CAS  Google Scholar 

  26. Cho, K. et al. isoMETLIN: a database for isotope-based metabolomics. Anal. Chem. 86, 9358–9361 (2014).

    Article  CAS  Google Scholar 

  27. Bonner, R. & Hopfgartner, G. SWATH data independent acquisition mass spectrometry for metabolomics. Trends Analyt. Chem. https://doi.org/10.1016/j.trac.2018.10.014 (2018).

  28. Telu, K. H., Yan, X., Wallace, W. E., Stein, S. E. & Simón‐Manso, Y. Analysis of human plasma metabolites across different liquid chromatography/mass spectrometry platforms: cross-platform transferable chemical signatures. Rapid Commun. Mass Spectrom. 30, 581–593 (2016).

    Article  CAS  Google Scholar 

  29. Schymanski, E. L. et al. Identifying small molecules via high-resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48, 2097–2098 (2014).

    Article  CAS  Google Scholar 

  30. Fiehn, O. et al. The metabolomics standards initiative (MSI). Metabolomics 3, 175–178 (2007).

    Article  CAS  Google Scholar 

  31. Licha, D. et al. Untargeted metabolomics reveals molecular effects of ketogenic diet on healthy and tumor xenograft mouse models. Int. J. Mol. Sci. 20, 3873 (2019).

    Article  CAS  Google Scholar 

  32. Spalding, J. L., Naser, F. J., Mahieu, N. G., Johnson, S. L. & Patti, G. J. Trace phosphate improves ZIC-pHILIC peak shape, sensitivity and coverage for untargeted metabolomics. J. Proteome Res. 17, 3537–3546 (2018).

    Article  CAS  Google Scholar 

  33. Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D. & Pletnev, I. InChI—the worldwide chemical structure identifier standard. J. Cheminform. 5, 7 (2013).

    Article  CAS  Google Scholar 

  34. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification. Anal. Chem. https://pubs.acs.org/doi/abs/10.1021/ac051437y (2006).

  35. Stein, S. E. & Scott, D. R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass. Spectrom. 5, 859–866 (1994).

    Article  CAS  Google Scholar 

  36. Chen, Y. & Wang, M. Hardness of approximation for sparse optimization with L0 norm. Technical Report (2016).

  37. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  38. Cho, K. et al. Targeting unique biological signals on the fly to improve MS/MS coverage and identification efficiency in metabolomics. Anal. Chim. Acta 1149, 338210 (2021).

    Article  CAS  Google Scholar 

  39. Tautenhahn, R., Böttcher, C. & Neumann, S. Highly sensitive feature detection for high-resolution LC/MS. BMC Bioinformatics 9, 504 (2008).

    Article  Google Scholar 

  40. Ethan Stancliffe. e-stan/DecoID: DecoID. https://doi.org/10.5281/zenodo.4783380 (Zenodo, 2021).

Download references

Acknowledgements

This work was supported by funding from the National Institutes of Health grants U01 CA235482 (to G.J.P.), R35 ES028365 (to G.J.P.) and R24 OD024624 (to G.J.P.).

Author information

Authors and Affiliations

Authors

Contributions

G.J.P. and E.S. conceptualized the project. E.S. wrote the source code and performed data analysis with input from G.J.P. and M.S.-H. M.S. and M.S.-H. acquired the MS/MS data and prepared all samples. E.S. and G.J.P. wrote the manuscript. All authors provided comments during manuscript preparation.

Corresponding author

Correspondence to Gary J. Patti.

Ethics declarations

Competing interests

G.J.P. is a scientific advisory board member for Cambridge Isotope Laboratories and has a research collaboration agreement with Thermo Fisher Scientific.

Additional information

Peer review information Nature Methods thanks Mingxun Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stancliffe, E., Schwaiger-Haber, M., Sindelar, M. et al. DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution. Nat Methods 18, 779–787 (2021). https://doi.org/10.1038/s41592-021-01195-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-021-01195-3

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research