Abstract
Untargeted mass spectrometry (MS) experiments produce complex, multidimensional data that are practically impossible to investigate manually. For this reason, computational pipelines are needed to extract relevant information from raw spectral data and convert it into a more comprehensible format. Depending on the sample type and/or goal of the study, a variety of MS platforms can be used for such analysis. MZmine is an open-source software for the processing of raw spectral data generated by different MS platforms. Examples include liquid chromatography–MS, gas chromatography–MS and MS–imaging. These data might typically be associated with various applications including metabolomics and lipidomics. Moreover, the third version of the software, described herein, supports the processing of ion mobility spectrometry (IMS) data. The present protocol provides three distinct procedures to perform feature detection and annotation of untargeted MS data produced by different instrumental setups: liquid chromatography–(IMS–)MS, gas chromatography–MS and (IMS–)MS imaging. For training purposes, example datasets are provided together with configuration batch files (i.e., list of processing steps and parameters) to allow new users to easily replicate the described workflows. Depending on the number of data files and available computing resources, we anticipate this to take between 2 and 24 h for new MZmine users and nonexperts. Within each procedure, we provide a detailed description for all processing parameters together with instructions/recommendations for their optimization. The main generated outputs are represented by aligned feature tables and fragmentation spectra lists that can be used by other third-party tools for further downstream analysis.
Key points
-
MZmine is a program designed to process data from untargeted mass spectrometry (MS) experiments acquired in data-dependent acquisition mode; specifically, collision-induced dissociation and higher-energy collisional dissociation.
-
This protocol provides three distinct procedures to perform feature detection and annotation of untargeted MS data produced by instrumental setups: liquid chromatography–(ion mobility spectrometry–)MS, gas chromatography–MS and (ion mobility spectrometry–)MS imaging.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All example datasets used in this protocol are publicly available through the GNPS-MassIVE, MetaboLights and Metabolomics Workbench repositories under the following accession numbers: MSV000091634, Procedure 1, LC–IMS–MS; ST000981, Procedure 2, GC–EI–MS; MSV000090328, Procedure 3, IMS–MS imaging; MSV000091642, lipid annotation (Procedure 1, Step 21), LC–IMS–MS; MTBLS265, export for statistics (Procedure 1, Step 25), LC–MS. The FBMN results can be accessed on GNPS at https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=ffd5aee568b54d9da1f3b771c459ebe5.
Code availability
The latest release of MZmine can be downloaded from https://www.mzmine.org. The complete source code is available at https://github.com/mzmine/mzmine3/ under the MIT licence. The MZmine documentation is hosted on GitHub and available at https://www.mzmine.org/documentation.
References
Alseekh, S. et al. Mass spectrometry-based metabolomics: a guide for annotation, quantification and best reporting practices. Nat. Methods 18, 747–756 (2021).
Da Silva, R. R., Dorrestein, P. C. & Quinn, R. A. Illuminating the dark matter in metabolomics. Proc. Natl Acad. Sci. USA 112, 12549–12550 (2015).
Müller, C., Binder, U., Bracher, F. & Giera, M. Antifungal drug testing by combining minimal inhibitory concentration testing with target identification by gas chromatography–mass spectrometry. Nat. Protoc. 12, 947–963 (2017).
Lisec, J., Schauer, N., Kopka, J., Willmitzer, L. & Fernie, A. R. Gas chromatography mass spectrometry-based metabolite profiling in plants. Nat. Protoc. 1, 387–396 (2006).
Chan, E. C. Y., Pasikanti, K. K. & Nicholson, J. K. Global urinary metabolic profiling procedures using gas chromatography–mass spectrometry. Nat. Protoc. 6, 1483–1499 (2011).
Goodacre, R., Vaidyanathan, S., Dunn, W. B., Harrigan, G. G. & Kell, D. B. Metabolomics by numbers: acquiring and understanding global metabolite data. Trends Biotechnol. 22, 245–252 (2004).
Aksenov, A. A., da Silva, R., Knight, R., Lopes, N. P. & Dorrestein, P. C. Global chemical analysis of biology by mass spectrometry. Nat. Rev. Chem. 1, 1–20 (2017).
Kompauer, M., Heiles, S. & Spengler, B. Atmospheric pressure MALDI mass spectrometry imaging of tissues and cells at 1.4-μm lateral resolution. Nat. Methods 14, 90–96 (2017).
Schmid, R. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01690-2 (2023).
Paglia, G., Smith, A. J. & Astarita, G. Ion mobility mass spectrometry in the omics era: challenges and opportunities for metabolomics and lipidomics. Mass Spectrom. Rev. https://doi.org/10.1002/mas.21686 (2021).
Vasilopoulou, C. G. et al. Trapped ion mobility spectrometry and PASEF enable in-depth lipidomics from minimal sample amounts. Nat. Commun. 11, 331 (2020).
Chang, H.-Y. et al. A practical guide to metabolomics software development. Anal. Chem. 93, 1912–1923 (2021).
Tautenhahn, R., Böttcher, C. & Neumann, S. Highly sensitive feature detection for high resolution LC/MS. BMC Bioinforma. 9, 504 (2008).
De Vijlder, T. et al. A tutorial in small molecule identification via electrospray ionization-mass spectrometry: the practical art of structural elucidation. Mass Spectrom. Rev. 37, 607–629 (2018).
Korf, A., Jeck, V., Schmid, R., Helmer, P. O. & Hayen, H. Lipid species annotation at double bond position level with custom databases by extension of the MZmine 2 open-source software package. Anal. Chem. 91, 5098–5105 (2019).
Dührkop, K. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16, 299–302 (2019).
Wang, M. et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 34, 828–837 (2016).
Pluskal, T. et al. in Processing Metabolomics and Proteomics Data with Open Software 232–254 (Royal Society of Chemistry, 2020).
Hammann, S., Korf, A., Bull, I. D., Hayen, H. & Cramp, L. J. E. Lipid profiling and analytical discrimination of seven cereals using high temperature gas chromatography coupled to high resolution quadrupole time-of-flight mass spectrometry. Food Chem. 282, 27–35 (2019).
Simon, C. et al. Mass difference matching unfolds hidden molecular structures of dissolved organic matter. Environ. Sci. Technol. 56, 11027–11040 (2022).
Korf, A. et al. Digging deeper—a new data mining workflow for improved processing and interpretation of high resolution GC–Q-TOF MS data in archaeological research. Sci. Rep. 10, 767 (2020).
Brungs, C. et al. Tattoo pigment identification in inks and skin biopsies of adverse reactions by complementary elemental and molecular bioimaging with mass spectral library matching. Anal. Chem. 94, 3581–3589 (2022).
Wolf, C. et al. Mobility-resolved broadband dissociation and parallel reaction monitoring for laser desorption/ionization–mass spectrometry—tattoo pigment identification supported by trapped ion mobility spectrometry. Anal. Chim. Acta 1242, 340796 (2023).
Deutsch, E. W. Mass spectrometer output file format mzML. Methods Mol. Biol. 604, 319–331 (2010).
Pedrioli, P. G. A. et al. A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 22, 1459–1466 (2004).
Römpp, A. et al. imzML: imaging mass spectrometry markup language: a common data format for mass spectrometry imaging. Methods Mol. Biol. 696, 205–224 (2011).
Rew, R. & Davis, G. NetCDF: an interface for scientific data access. IEEE Comput. Graph. Appl. 10, 76–82 (1990).
Lu, M., An, S., Wang, R., Wang, J. & Yu, C. Aird: a computation-oriented mass spectrometry data format enables a higher compression ratio and less decoding time. BMC Bioinforma. 23, 35 (2022).
Adusumilli, R. & Mallick, P. Data conversion with ProteoWizard msConvert. Methods Mol. Biol. 1550, 339–368 (2017).
Chambers, M. C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78, 779–787 (2006).
Röst, H. L. et al. OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13, 741–748 (2016).
Tsugawa, H. et al. MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12, 523–526 (2015).
Kirkwood, K. I. et al. Utilizing Skyline to analyze lipidomics data containing liquid chromatography, ion mobility spectrometry and mass spectrometry dimensions. Nat. Protoc. 17, 2415–2430 (2022).
Wilkinson, M. D. et al. The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Barker, M. et al. Introducing the FAIR principles for research software. Sci. Data 9, 622 (2022).
Haug, K. et al. MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data. Nucleic Acids Res 41, D781–D786 (2013).
Sud, M. et al. Metabolomics Workbench: an international repository for metabolomics data and metadata, metabolite standards, protocols, tutorials and training, and analysis tools. Nucleic Acids Res 44, D463–D470 (2016).
Meier, F. et al. Online parallel accumulation–serial fragmentation (PASEF) with a novel trapped ion mobility mass spectrometer. Mol. Cell. Proteom. 17, 2534–2545 (2018).
Whittemore, J. C., Stokes, J. E., Laia, N. L., Price, J. M. & Suchodolski, J. S. Short and long-term effects of a synbiotic on clinical signs, the fecal microbiome, and metabolomic profiles in healthy research cats receiving clindamycin: a randomized, controlled trial. PeerJ 6, e5130 (2018).
Matyash, V., Liebisch, G., Kurzchalia, T. V., Shevchenko, A. & Schwudke, D. Lipid extraction by methyl-tert-butyl ether for high-throughput lipidomics. J. Lipid Res. 49, 1137–1146 (2008).
Chaleckis, R., Murakami, I., Takada, J., Kondoh, H. & Yanagida, M. Individual variability in human blood metabolites identifies age-related differences. Proc. Natl Acad. Sci. USA 113, 4252–4259 (2016).
Smith, R., Ventura, D. & Prince, J. T. LC–MS alignment in theory and practice: a comprehensive algorithmic review. Brief. Bioinform. 16, 104–117 (2015).
Pluskal, T., Uehara, T. & Yanagida, M. Highly accurate chemical formula prediction tool utilizing high-resolution mass spectra, MS/MS fragmentation, heuristic rules, and isotope pattern matching. Anal. Chem. 84, 4396–4403 (2012).
Renai, L. et al. Combining feature-based molecular networking and contextual mass spectral libraries to decipher nutrimetabolomics profiles. Metabolites 12, 1005 (2022).
Bazsó, F. L. et al. Quantitative comparison of tandem mass spectra obtained on various instruments. J. Am. Soc. Mass Spectrom. 27, 1357–1365 (2016).
Nothias, L. F. et al. Feature-based molecular networking in the GNPS analysis environment. Nat. Methods 17, 905–908 (2020).
Olivon, F. et al. MetGem software for the generation of molecular networks based on the t-SNE algorithm. Anal. Chem. 90, 13900–13908 (2018).
Elie, N., Santerre, C. & Touboul, D. Generation of a molecular network from electron ionization mass spectrometry data by combining MZmine2 and MetGem software. Anal. Chem. 91, 11489–11492 (2019).
Zdouc, M. M. et al. FERMO: a dashboard for streamlined rationalized prioritization of molecular features from mass spectrometry data. Preprint at bioRxiv https://doi.org/10.1101/2022.12.21.521422 (2022).
Schmid, R. et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 12, 3832 (2021).
Pakkir Shah, A. K. The hitchhiker’s guide to statistical analysis of feature-based molecular networks from non-targeted metabolomics data. Preprint at ChemRxiv https://doi.org/10.26434/chemrxiv-2023-wwbt0 (2023).
Pang, Z. et al. MetaboAnalyst 5.0: narrowing the gap between raw spectra and functional insights. Nucleic Acids Res 49, W388–W396 (2021).
Pang, Z. et al. Using MetaboAnalyst 5.0 for LC–HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data. Nat. Protoc. 17, 1735–1761 (2022).
Myers, O. D., Sumner, S. J., Li, S., Barnes, S. & Du, X. One step forward for reducing false positive and false negative compound identifications from mass spectrometry metabolomics data: new algorithms for constructing extracted ion chromatograms and detecting chromatographic peaks. Anal. Chem. 89, 8696–8703 (2017).
Du, X., Smirnov, A., Pluskal, T., Jia, W. & Sumner, S. in Computational Methods and Data Analysis for Metabolomics (ed. Li, S.) 25–48 (Springer, 2020).
Smirnov, A. et al. adap-gc 4.0: application of clustering-assisted multivariate curve resolution to spectral deconvolution of gas chromatography–mass spectrometry metabolomics data. Anal. Chem. 91, 9069–9077 (2019).
Katajamaa, M., Miettinen, J. & Oresic, M. MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics 22, 634–636 (2006).
Lerno, L. A. Jr, German, J. B. & Lebrilla, C. B. Method for the identification of lipid classes based on referenced Kendrick mass analysis. Anal. Chem. 82, 4236–4245 (2010).
Sleno, L. The use of mass defect in modern mass spectrometry. J. Mass Spectrom. 47, 226–236 (2012).
Helmer, P. O., Korf, A. & Hayen, H. Analysis of artificially oxidized cardiolipins and monolyso-cardiolipins via liquid chromatography/high-resolution mass spectrometry and Kendrick mass defect plots after hydrophilic interaction liquid chromatography based sample preparation. Rapid Commun. Mass Spectrom. 34, e8566 (2020).
Müller, W. H. et al. Dual-polarity SALDI FT–ICR MS imaging and Kendrick mass defect data filtering for lipid analysis. Anal. Bioanal. Chem. 413, 2821–2830 (2021).
Korf, A. et al. Three-dimensional Kendrick mass plots as a tool for graphical lipid identification. Rapid Commun. Mass Spectrom. 32, 981–991 (2018).
Korf, A., Fouquet, T., Schmid, R., Hayen, H. & Hagenhoff, S. Expanding the Kendrick mass plot toolbox in MZmine 2 to enable rapid polymer characterization in liquid chromatography−mass spectrometry data sets. Anal. Chem. 92, 628–633 (2020).
Aron, A. T. et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 15, 1954–1991 (2020).
Beniddir, M. A. et al. Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches. Nat. Prod. Rep. 38, 1967–1993 (2021).
Dührkop, K. et al. Systematic classification of unknown metabolites using high-resolution fragmentation mass spectra. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0740-8 (2020).
Wang, M. et al. Mass spectrometry searches using MASST. Nat. Biotechnol. 38, 23–26 (2020).
da Silva, R. R. et al. Propagating annotations of molecular networks using in silico fragmentation. PLoS Comput. Biol. 14, e1006089 (2018).
Acknowledgements
T.P. is supported by the Czech Science Foundation (GA CR) grant 21-11563M and by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement no. 891397. T.D. is supported by the European Regional Development Fund, Programme Johannes Amos Comenius project ‘IOCB MSCA PF Mobility’ no. CZ.02.01.01/00/22_010/0002733. C.B. is supported by the Czech Academy of Sciences Program to Support Prospective Human Resources. A.S. and X.D. are supported by the National Institutes of Health grant U01CA235507. P.C.D. is supported by R01GM107550, R03OD034493, R01DK136117 and NSF 2152526. We thank F. Rooks and M. FitzGerald for editing the manuscript.
Author information
Authors and Affiliations
Contributions
Design and oversight of the project performed by S.H., T.D., A.S., R.S. and T.P. Online MZmine documentation performed by S.H., T.D., O.M., C.B., A.K., R.S. Testing the workflows described and improving the descriptions performed by S.H., T.D., O.M., C.B., A.K., J.D.S., P.S., N.D., L.-F.N., T.H., M.O., U.K., P.C.D., D.P., X.D., J.J.J.vdH. and R.S.
Corresponding authors
Ethics declarations
Competing interests
P.C.D. served as a consultant for DSM Animal Health in 2023, is an advisor and has equity in Cybele and is a co-founder, advisor and holds equity in Ometa, Enveda and Arome with prior approval by UC San Diego. J.J.J.vdH. is a member of the Scientific Advisory Board of NAICONS Srl., Milan, Italy, and consults for Corteva Agriscience, Indianapolis, IN, USA.
Peer review
Peer review information
Nature Protocols thanks Guowang Xu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Related links
Key references using this protocol
Schmid, R. et al. Nat. Biotechnol. 41, 447–449 (2023): https://doi.org/10.1038/s41587-023-01690-2
Schmid, R. et al. Nat. Commun. 12, 3832 (2021): https://doi.org/10.1038/s41467-021-23953-9
Heuckeroth, S. et al. Nat. Commun. 14, 7495 (2023): https://doi.org/10.1038/s41467-023-43298-9
Extended data
Extended Data Fig. 1 Screenshot of the batch mode dialogue box.
The current processing steps are displayed in the ‘Batch queue’ panel. Additional steps can be selected from the ‘Modules’ panel and included using the double-arrows buttons. The current batch file can be saved using the ‘Save’ button whereas other batch files can be imported using the ‘Load’ button. Some modules offer a ‘Show preview’ option that can be opened by ticking the corresponding checkbox. For the preview to work, data must be already imported in MZmine. The online documentation for each processing module can be opened using the ‘Help’ button.
Extended Data Fig. 2 Screenshot of the ‘Raw data overview’ module.
The module displays three panels: chromatogram panel (left), mass spectrum panel (right) and scan information panel (bottom panel), which contains information for every scan in the data file.
Extended Data Fig. 3 Screenshot of the interactive visualisation panel in the Local minimum resolver module.
Two subpanels are present: one for ‘noisy’ and one for ‘good’ EIC traces. The goal of the parameters optimization is to ensure detection of true features while minimising ‘noisy’ peaks to be retained as features. Feature lists and EIC traces to display can be chosen from the corresponding drop-down menus. Detected features are colour-filled and resolved peaks are shown in different colours.
Extended Data Fig. 4 Screenshot of the ‘Ion mobility raw data overview’ module.
a, A summed frame spectrum with a blue indicator at the selected m/z. b, A chromatogram plot showing the BPC (black) and EIC (blue) of the selected m/z. The blue indicator shows the RT of the selected frame. c, A total ion mobilogram of the selected frame. d, A mobility vs. m/z heatmap of the selected frame. e, An ion mobility trace of the selected m/z in RT and mobility dimensions.
Supplementary information
Supplementary Data 1
All batch files optimized for each example dataset and the corresponding data processing outputs (feature lists and MS2 spectral lists).
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Heuckeroth, S., Damiani, T., Smirnov, A. et al. Reproducible mass spectrometry data processing and compound annotation in MZmine 3. Nat Protoc (2024). https://doi.org/10.1038/s41596-024-00996-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41596-024-00996-y
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.