Abstract
Chimeric MS/MS spectra contain fragments from multiple precursor ions and therefore hinder compound identification in metabolomics. Historically, deconvolution of these chimeric spectra has been challenging and relied on specific experimental methods that introduce variation in the ratios of precursor ions between multiple tandem mass spectrometry (MS/MS) scans. DecoID provides a complementary, method-independent approach where database spectra are computationally mixed to match an experimentally acquired spectrum by using LASSO regression. We validated that DecoID increases the number of identified metabolites in MS/MS datasets from both data-independent and data-dependent acquisition without increasing the false discovery rate. We applied DecoID to publicly available data from the MetaboLights repository and to data from human plasma, where DecoID increased the number of identified metabolites from data-dependent acquisition data by over 30% compared to direct spectral matching. DecoID is compatible with any user-defined MS/MS database and provides automated searching for some of the largest MS/MS databases currently available.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All MS/MS data used in the evaluation of DecoID have been uploaded to the MetaboLights repository as study MTBLS2207 and is also available on the DecoID GitHub release (https://github.com/e-stan/DecoID/releases/). The publicly available dataset analyzed is available on MetaboLights as study MTBLS1066 (all reversed-phase, negative-mode data files were used). The MS/MS databases applied can be obtained at the curators’ websites (https://mona.fiehnlab.ucdavis.edu/, https://www.mzcloud.org/ and https://hmdb.ca/). The in-house IROA metabolite database is available within the DecoID release on GitHub (https://github.com/e-stan/DecoID/releases/), and the reference spectra have been uploaded to MoNA (submitter: E.S.; origin file: IROA_DB_for_mona_filtered_exported_addedInfo.msp).
Code availability
Source code is available on Zenodo40 and GitHub (https://github.com/e-stan/DecoID/). Included is an example dataset along with documentation for both the DecoID Python package and user interface. A standalone executable built for Windows can alternatively be downloaded from the Patti Lab website (http://pattilab.wustl.edu/software/DecoID/).
References
Blaženović, I., Kind, T., Ji, J. & Fiehn, O. Software tools and approaches for compound identification of LC–MS/MS data in metabolomics. Metabolites 8, 31 (2018).
Baker, E. S. & Patti, G. J. Perspectives on data analysis in metabolomics: points of agreement and disagreement from the 2018 ASMS fall workshop. J. Am. Soc. Mass Spectrom. https://doi.org/10.1007/s13361-019-02295-3 (2019).
Nikolskiy, I., Mahieu, N. G., Chen, Y.-J., Tautenhahn, R. & Patti, G. J. An untargeted metabolomic workflow to improve structural characterization of metabolites. Anal. Chem. 85, 7713–7719 (2013).
Nash, W. J. & Dunn, W. B. From mass to metabolite in human untargeted metabolomics: recent advances in annotation of metabolites applying liquid chromatography–mass spectrometry data. Trends Analyt. Chem. 120, 115324 (2019).
Tsugawa, H. et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526 (2015).
Samanipour, S., Reid, M. J., Bæk, K. & Thomas, K. V. Combining a deconvolution and a universal library search algorithm for the nontarget analysis of data-independent acquisition mode liquid chromatography−high-resolution mass spectrometry results. Environ. Sci. Technol. 52, 4694–4701 (2018).
Li, H., Cai, Y., Guo, Y., Chen, F. & Zhu, Z.-J. MetDIA: targeted metabolite extraction of multiplexed MS/MS spectra generated by data-independent acquisition. Anal. Chem. 88, 8757–8764 (2016).
Yin, Y., Wang, R., Cai, Y., Wang, Z. & Zhu, Z.-J. DecoMetDIA: deconvolution of multiplexed MS/MS spectra for metabolite identification in SWATH-MS-based untargeted metabolomics. Anal. Chem. 91, 11897–11904 (2019).
Ting, Y. S. et al. PECAN: library-free peptide detection for data-independent acquisition tandem mass spectrometry data. Nat. Methods 14, 903–908 (2017).
Tsou, C.-C. et al. DIA-Umpire: comprehensive computational framework for data-independent acquisition proteomics. Nat. Methods 12, 258–264 (2015).
Wang, J. et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nat. Methods 12, 1106–1108 (2015).
Zhang, B., Pirmoradian, M., Chernobrovkin, A. & Zubarev, R. A. DeMix workflow for efficient identification of cofragmented peptides in high-resolution data-dependent tandem mass spectrometry. Mol. Cell. Proteomics 13, 3211–3223 (2014).
Dorfer, V., Maltsev, S., Winkler, S. & Mechtler, K. CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17, 2581–2589 (2018).
Houel, S. et al. Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. J. Proteome Res. 9, 4152–4160 (2010).
Haug, K. et al. MetaboLights: a resource evolving in response to the needs of its scientific community. Nucleic Acids Res. 48, D440–D444 (2020).
Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training and analysis tools. Nucleic Acids Res. 44, D463–D470 (2016).
Kind, T. et al. Identification of small molecules using accurate mass MS/MS search. Mass Spectrom. Rev. 37, 513–532 (2017).
Zhu, X., Chen, Y. & Subramanian, R. Comparison of information-dependent acquisition, SWATH and MSAll techniques in metabolite identification study employing ultrahigh-performance liquid chromatography–quadrupole time-of-flight mass spectrometry. Anal. Chem. 86, 1202–1209 (2014).
Lawson, T. N. et al. msPurity: automated evaluation of precursor ion purity for mass spectrometry-based fragmentation in metabolomics. Anal. Chem. 89, 2432–2439 (2017).
Peckner, R. et al. Specter: linear deconvolution for targeted analysis of data-independent acquisition mass spectrometry proteomics. Nat. Methods 15, 371–378 (2018).
Kessner, D., Chambers, M., Burke, R., Agus, D. & Mallick, P. ProteoWizard: open-source software for rapid proteomics tools development. Bioinformatics 24, 2534–2536 (2008).
Wishart, D. S. et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46, D608–D617 (2018).
Horai, H. et al. MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45, 703–714 (2010).
Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Methodol. 58, 267–288 (1996).
Vinaixa, M. et al. Mass spectral databases for LC/MS- and GC/MS-based metabolomics: state of the field and future prospects. Trends Analyt. Chem. 78, 23–35 (2016).
Cho, K. et al. isoMETLIN: a database for isotope-based metabolomics. Anal. Chem. 86, 9358–9361 (2014).
Bonner, R. & Hopfgartner, G. SWATH data independent acquisition mass spectrometry for metabolomics. Trends Analyt. Chem. https://doi.org/10.1016/j.trac.2018.10.014 (2018).
Telu, K. H., Yan, X., Wallace, W. E., Stein, S. E. & Simón‐Manso, Y. Analysis of human plasma metabolites across different liquid chromatography/mass spectrometry platforms: cross-platform transferable chemical signatures. Rapid Commun. Mass Spectrom. 30, 581–593 (2016).
Schymanski, E. L. et al. Identifying small molecules via high-resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48, 2097–2098 (2014).
Fiehn, O. et al. The metabolomics standards initiative (MSI). Metabolomics 3, 175–178 (2007).
Licha, D. et al. Untargeted metabolomics reveals molecular effects of ketogenic diet on healthy and tumor xenograft mouse models. Int. J. Mol. Sci. 20, 3873 (2019).
Spalding, J. L., Naser, F. J., Mahieu, N. G., Johnson, S. L. & Patti, G. J. Trace phosphate improves ZIC-pHILIC peak shape, sensitivity and coverage for untargeted metabolomics. J. Proteome Res. 17, 3537–3546 (2018).
Heller, S., McNaught, A., Stein, S., Tchekhovskoi, D. & Pletnev, I. InChI—the worldwide chemical structure identifier standard. J. Cheminform. 5, 7 (2013).
XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching and identification. Anal. Chem. https://pubs.acs.org/doi/abs/10.1021/ac051437y (2006).
Stein, S. E. & Scott, D. R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass. Spectrom. 5, 859–866 (1994).
Chen, Y. & Wang, M. Hardness of approximation for sparse optimization with L0 norm. Technical Report (2016).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Cho, K. et al. Targeting unique biological signals on the fly to improve MS/MS coverage and identification efficiency in metabolomics. Anal. Chim. Acta 1149, 338210 (2021).
Tautenhahn, R., Böttcher, C. & Neumann, S. Highly sensitive feature detection for high-resolution LC/MS. BMC Bioinformatics 9, 504 (2008).
Ethan Stancliffe. e-stan/DecoID: DecoID. https://doi.org/10.5281/zenodo.4783380 (Zenodo, 2021).
Acknowledgements
This work was supported by funding from the National Institutes of Health grants U01 CA235482 (to G.J.P.), R35 ES028365 (to G.J.P.) and R24 OD024624 (to G.J.P.).
Author information
Authors and Affiliations
Contributions
G.J.P. and E.S. conceptualized the project. E.S. wrote the source code and performed data analysis with input from G.J.P. and M.S.-H. M.S. and M.S.-H. acquired the MS/MS data and prepared all samples. E.S. and G.J.P. wrote the manuscript. All authors provided comments during manuscript preparation.
Corresponding author
Ethics declarations
Competing interests
G.J.P. is a scientific advisory board member for Cambridge Isotope Laboratories and has a research collaboration agreement with Thermo Fisher Scientific.
Additional information
Peer review information Nature Methods thanks Mingxun Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Arunima Singh was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
About this article
Cite this article
Stancliffe, E., Schwaiger-Haber, M., Sindelar, M. et al. DecoID improves identification rates in metabolomics through database-assisted MS/MS deconvolution. Nat Methods 18, 779–787 (2021). https://doi.org/10.1038/s41592-021-01195-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-021-01195-3