Innovation in mass spectrometry (MS) and the rapidly increasing throughput and sensitivity of MS instrumentation require adaptations and innovations in data processing tools. Here, we introduce MZmine 3, a scalable MS data analysis platform that supports hybrid datasets from various instrumental setups, including liquid and gas chromatography (LC and GC)–MS, ion mobility spectrometry (IMS)–MS and MS imaging. In particular, the integration of IMS–MS imaging and LC–IMS–MS datasets provides opportunities for spatial metabolomics analyses with increased annotation confidence.
Over the past decade, the MZmine project has evolved into a community-driven, collaborative effort. As an open-source ecosystem for MS data processing, MZmine is a cross-platform software (Supplementary Note 1) that can be tuned for robust, scalable and reproducible data analysis on personal computers as well as high-performance supercomputers. The project has seen continuous development since its inception in 2004 (refs. 1,2). Community additions (Fig. 1a) introduced various functions, such as performant feature detection workflows3,4, modules for lipid annotation5 and strong ties to other community projects (Fig. 1b). Here, data exchange formats and direct interfaces (listed under ‘Tool integration’ in the documentation) enable downstream analysis in external tools, such as compound annotation in SIRIUS6 and statistical analysis in MetaboAnalyst7, and directly bind MZmine results into the molecular networking ecosystem of the Global Natural Products Social Molecular Networking (GNPS) web platform (Supplementary Note 2)8,9,10.
a, Overview of active developments and key additions to MZmine since the first publication, which led to over 180 modules that now drive interactive, reproducible and efficient data processing and visualization in MZmine 3. b, Data exchange formats and direct interfaces enable downstream analysis with strong ties to projects like GNPS, SIRIUS and MetaboAnalyst. c, The integrative LC–MS and IMS–MS imaging workflow applies feature detection in RT, ion mobility and m/z dimension to MS data stored in open or vendor formats. Comprehensive processing and annotation results are merged into an aligned feature list. d, An aligned feature list with one ion feature detected in LC–IMS–MS samples and aligned to one MALDI–IMS–MS ion feature image. Annotation results (‘Lipid annotation’ column) and interactive charts include the table columns ‘Shapes’ (extracted ion chromatograms), ‘Mobilograms’ (extracted ion mobilograms) and ‘Images’ (extracted ion images).
Recent advances in MS instrumentation push sensitivity, resolving power and data acquisition speed, resulting in increased data volume and complexity. Notably, IMS gains traction in the field by including an additional separation dimension to LC–MS or imaging-based techniques like matrix-assisted laser desorption/ionization (MALDI)–MS. These advances introduce new acquisition modes (for example, parallel accumulation–serial fragmentation (PASEF))11 or enable combination of IMS and imaging, which was shown to improve annotation quality in MS imaging12. Furthermore, the number of large-scale cohort and multifactorial studies in clinical, environmental and other fields is growing, as registered in the three main metabolomics data repositories: MassIVE/GNPS8, MetaboLights and Metabolomics Workbench13. The need for scalable, reproducible and flexible data analysis workflows that can combine MS data from various sources remains unaddressed by existing tools. For example, to combine LC–(IMS–)MS and MS imaging results from the same sample, users are forced to master multiple software tools12 that divide the workflow and are specialized for either chromatography–MS (for example, MS-DIAL, XCMS, OpenMS)14,15,16 or MS imaging (for example, METASPACE, rMSI, Cardinal MSI, SpectralAnalysis)17.
The integrative spatial metabolomics workflow in MZmine 3 (Fig. 1c) imports LC–IMS–MS and IMS–MS imaging datasets stored in either open or vendor-specific formats and processes them by non-targeted feature detection. This entails resolving peak shapes for ion features in both the retention time (RT) and ion mobility dimension in LC–IMS–MS and extracting mobility-resolved ion image features with spatial distributions in IMS–MS imaging (Supplementary Figs. 1–3). Individual features from both methodologies are subsequently represented and aligned by their RT (LC only), m/z and ion mobility values. The resulting aligned feature list combines the strengths of the individual analytical methods by integrating the compound annotation capabilities of modern chromatography-based MS with spatial metabolite distributions that can be mapped to histological data, addressing the issue of missing MS2 data in most imaging studies. For data evaluation, MZmine organizes annotations in a feature table with interactive charts, exemplified in Fig. 1d for one ion feature detected in LC–IMS–MS samples and aligned to an ion image from one MALDI–IMS–MS imaging dataset. An exemplary spatial metabolomics workflow leading to LC–IMS–MS-resolved molecular networks, enriched with spatial ion feature information, is described in Supplementary Note 2 and Supplementary Fig. 4. Additional visualization modules (Supplementary Fig. 5) connect all available data dimensions; a fast memory-mapped data back end enables interactive exploration.
In MZmine 3, special attention was directed toward scalability due to the ever increasing study sizes that lead to large volumes of raw data, particularly in the case of LC–IMS–MS datasets. Efficient memory management and parallelization removed bottlenecks, resulting in an 89% reduction in processing time for 250 dissolved organic matter samples when compared to MZmine 2. A stress test demonstrated high sample throughput, where the mean processing times amounted to 0.1% to 0.3% of the total data acquisition time for six different LC–MS datasets (Supplementary Note 3 and Supplementary Fig. 6). Further, MZmine 3 was benchmarked using 8,273 fecal LC–MS2 samples, requiring just 47 min of processing time (see hardware specifications in Supplementary Note 3).
The improved performance of MZmine 3 over previous MZmine versions now allows processing of large datasets, including large-volume LC–IMS–MS data. For new users, the MZmine website contains detailed manuals and video tutorials, and the new processing wizard in MZmine provides starting points for various standard workflows and mass spectrometer types. In addition, a development tutorial is available for potential new contributors, and the modular design of MZmine enables testing and implementing of new ideas within the MZmine framework.
Data availability
Datasets are available on MassIVE8 with the following accession IDs: MSV000088054, human cohort study, LC–MS, neg; MSV000087728, diverse plant extracts, LC–MS2, top-3 DDA, pos; MSV000090079, dissolved organic matter, LC–MS2, top-5 DDA, pos; MSV000090328, sheep brain, LC–TIMS-MS, PASEF, pos; MSV000090327, piper plant extracts, LC–TIMS-MS, PASEF, pos. IMS resolved ion identity molecular networking results are available through GNPS: https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=7a06fa3dfadd4158bcb4ee300b574747
Code availability
The latest release of MZmine can be downloaded from https://www.mzmine.org. The complete source code is available at https://github.com/mzmine/mzmine3/ under the MIT license. The MZmine documentation is hosted on GitHub and available at https://www.mzmine.org/documentation.
References
Katajamaa, M., Miettinen, J. & Oresic, M. Bioinformatics 22, 634–636 (2006).
Pluskal, T., Castillo, S., Villar-Briones, A. & Oresic, M. BMC Bioinformatics 11, 395 (2010).
Smirnov, A. et al. Anal. Chem. 91, 9069–9077 (2019).
Du, X., Smirnov, A., Pluskal, T., Jia, W. & Sumner, S. Methods Mol. Biol. 2104, 25–48 (2020).
Korf, A., Jeck, V., Schmid, R., Helmer, P. O. & Hayen, H. Anal. Chem. 91, 5098–5105 (2019).
Dührkop, K. et al. Nat. Methods 16, 299–302 (2019).
Pang, Z. et al. Nucleic Acids Res. 49, W388–W396 (2021). W1.
Wang, M. et al. Nat. Biotechnol. 34, 828–837 (2016).
Nothias, L.-F. et al. Nat. Methods 17, 905–908 (2020).
Schmid, R. et al. Nat. Commun. 12, 3832 (2021).
Meier, F. et al. J. Proteome Res. 14, 5378–5387 (2015).
Helmer, P. O. et al. Anal. Chem. 93, 2135–2143 (2021).
Aksenov, A. A., da Silva, R., Knight, R., Lopes, N. P. & Dorrestein, P. C. Nat. Rev. Chem. 1, 0054 (2017).
Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. Anal. Chem. 78, 779–787 (2006).
Tsugawa, H. et al. Nat. Biotechnol. 38, 1159–1163 (2020).
Röst, H. L. et al. Nat. Methods 13, 741–748 (2016).
Weiskirchen, R., Weiskirchen, S., Kim, P. & Winkler, R. J. Cheminform. 11, 16 (2019).
Acknowledgements
We thank Christopher Jensen and Gauthier Boaglio for their contributions to the MZmine codebase. We thank Jianbo Zhang and Zachary Russ for their donations to MZmine development. The MZmine 3 logo was designed by the Bioinformatics & Research Computing group at the Whitehead Institute for Biomedical Research. T.P. is supported by Czech Science Foundation (GA CR) grant 21-11563M and by the European Union’s Horizon 2020 research and innovation programme under Marie Skłodowska-Curie grant agreement 891397. Support for P.C.D. was from US NIH U19 AG063744, P50HD106463, 1U24DK133658 and BBSRC-NSF award 2152526. T.S. acknowledges funding by Deutsche Forschungsgemeinschaft (441958208). M. Wang acknowledges the US Department of Energy Joint Genome Institute (https://ror.org/04xm1d337, a DOE Office of Science User Facility) and is supported by the Office of Science of the US Department of Energy operated under subcontract No. 7601660. E.R. and H.H. thank Wen Jiang (HILICON AB) for providing the iHILIC Fusion(+) column for HILIC measurements. M.F., K.D. and S.B. are supported by Deutsche Forschungsgemeinschaft (BO 1910/20). L.-F.N. is supported by the Swiss National Science Foundation (project 189921). D.P. was supported through the Deutsche Forschungsgemeinschaft (German Research Foundation) through the CMFI Cluster of Excellence (EXC-2124 — 390838134 project-ID 1-03.006_0) and the Collaborative Research Center CellMap (TRR 261 - 398967434). J.-K.W. acknowledges the US National Science Foundation (MCB-1818132), the US Department of Agriculture, and the Chan Zuckerberg Initiative. MZmine developers have received support from the European COST Action CA19105 — Pan-European Network in Lipidomics and EpiLipidomics (EpiLipidNET). We acknowledge the support of the Google Summer of Code (GSoC) program, which has funded the development of several MZmine modules through student projects. We thank Adam Tenderholt for introducing MZmine to the GSoC program.
Author information
Authors and Affiliations
Contributions
R.S., S.H., T.P. are coordinating the MZmine open source project. S.H., R.S., P.C.D., T.P., A.K. wrote and edited the initial manuscript. S.H., R.S., A.K., T.P. conceived the combined workflow for MALDI–IMS–MS imaging and LC–IMS–MS, developed the code and tested the workflow. R.S., S.H., A.K., T.P. A. Smirnov, O. Myers, T.S.D., R.B., K.J.M., N.H., M.L., A. Sarvepalli, Z.Z., M.F., K.D., M. Wesner., M. Wang, S.J.H., O. Mokshyna, K.P., C.J.P., T.R.F., T.S. and more have contributed open source code to MZmine. C.B., T.D., S.H., L.M., O. Mokshyna, R.S., M.E. wrote the documentation for MZmine. L.-F.-N., A.R.-U., A.B., R.S., S.H., A.K., M.O., P.C.D., D.P., U.K., J.-K.W., H.H., X.D., S.B. initiated and/or supervised projects related to MZmine development. T.S., A.K., S.H., R.S., T.P., A.R.-U., A.B., N.H., D.P. were involved in the supervision of students for the Google Summer of Code program. R.S., L.-F.N., D.P., A. Sarvepalli, Z.Z., M. Wang, P.C.D. contributed to the linking with GNPS to facilitate molecular networking in MZmine. R.S., D.P., L.-F.N., M. Wang conceived and developed the FBMN and IIMN workflows in MZmine. S.H., R.S., A.K. implemented imzML support and developed imaging feature detection. S.H. developed the ion mobility data support, native .tdf support, ion mobility gap filling; added ion mobility visualization modules; recreated project load/save. A.K. provided TDF-SDK for native .tdf import and supervised S.H. for its implementation. S.H., A.K. developed ion mobility feature detection. A.K., H.H. developed lipid annotation modules and workflows and made it IMS aware. R.S., M. Wang developed parallel gap-filling. S.H., R.S. developed parallel sample alignment. T.S.D. implemented mzTab, MGF and MSP support and various peak information (FWHM, tailing factor, asymmetry factor, RT start and RT end). R.S., C.B., A.K. worked on the mass spectral library creation and matching workflows. K.D., M.F., R.S., S.H., S.B. assisted with the integration of SIRIUS and data exchange. A.R.-U., T.P. conceived the exact mass calibration module. M.L. developed support for the open data format ‘Aird’. S.J.H. developed diagnostic fragmentation filtering. M. Wesner developed the mass-voltammogram module. R.S., S.H. profiled and optimized MZmine’s memory consumption and processing throughput. S.H. prepared sheep brain lipid extracts, prepared MALDI samples, acquired imaging data, analyzed imaging and chromatographic data. H.R. and A.J. planned and carried out animal study ZH235/17. A.J. prepared thin sections and histologic tissue staining of the sheep brain dataset and supplied the tissue samples for extraction. P.O.H., C.B. provided testing data and feedback for LC–MS and IMS–MS imaging workflows. E.R. acquired LC–IMS–MS2 lipid data. R.S., S.H., D.P. conducted the performance tests. All authors edited and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
A.K. is employed at Bruker Daltonics GmbH & Co. KG. S.B., K.D. and M.F. are co-founders of Bright Giant. P.C.D. is a scientific advisor for Cybele and is a scientific advisor and a co-founder of Enveda, Arome and Ometa with prior approval by the University of California San Diego. M. Wang is a co-founder of Ometa Labs LLC. J.-K.W. is a member of the Scientific Advisory Board and a shareholder of DoubleRainbow Biosciences, Galixir and Inari Agriculture, which develop biotechnologies related to natural products, drug discovery and agriculture.
Peer review
Peer review information
Nature Biotechnology thanks Xiaotao Shen and Zheng-Jiang Zhu for their contribution to the peer review of this work.
Supplementary information
Supplementary Information
Supplementary Figs. 1–6, Supplementary Notes 1–3, Supplementary References
Supplementary Code
A ZIP archive with batch files for data processing using MZmine 3 and MZmine 2
Rights and permissions
About this article
Cite this article
Schmid, R., Heuckeroth, S., Korf, A. et al. Integrative analysis of multimodal mass spectrometry data in MZmine 3. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01690-2
Published:
DOI: https://doi.org/10.1038/s41587-023-01690-2