Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information


Mass spectrometry is a predominant experimental technique in metabolomics and related fields, but metabolite structural elucidation remains highly challenging. We report SIRIUS 4 (, which provides a fast computational approach for molecular structure identification. SIRIUS 4 integrates CSI:FingerID for searching in molecular structure databases. Using SIRIUS 4, we achieved identification rates of more than 70% on challenging metabolomics datasets.

Your institute does not have access to this article

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Illustration of SIRIUS 4 incorporating CSI:FingerID.
Fig. 2: In silico annotation of N-acyl amide molecules.

Code availability

SIRIUS 4 is written in Java; is open source under the GNU General Public License (version 3); and works on Windows, macOS X, and Linux. In addition to the graphical front end, a comprehensive command-line version allows batch processing and integration into workflows; integration into GNPS1, OpenMS2, and MZmine4 is ongoing. We also provide source code, executable binaries, documentation, support, non-commercial training data, example files, and additional information on the SIRIUS website (; a source copy is hosted on GitHub ( You can retrieve the InChIs of all compounds used to train CSI:FingerID from the web service ( and

Data availability

Data for the CASMI 2016 re-evaluation are available from under a Creative Commons CC-BY license. Cross-validation data for the GNPS search re-evaluation are available from (Creative Commons CC0 1.0 Universal license). Data for the American Gut project are available in the MassIVE database (MSV000080186 and MSV000080187; Creative Commons CC0 1.0 Universal license). The analysis can be accessed via the GNPS website ( and Data for the study of clothing with antibacterial properties are available at MassIVE (MSV000081379; Creative Commons CC0 1.0 Universal license). Analysis is available at the GNPS website ( Source data for Supplementary Figs. 68 are available online.


  1. Wang, M. et al. Nat. Biotechnol. 34, 828–837 (2016).

    CAS  Article  Google Scholar 

  2. Röst, H. L. et al. Nat. Methods 13, 741–748 (2016).

    Article  Google Scholar 

  3. Tsugawa, H. et al. Nat. Methods 12, 523–526 (2015).

    CAS  Article  Google Scholar 

  4. Pluskal, T., Castillo, S., Villar-Briones, A. & Oresic, M. BMC Bioinformatics 11, 395 (2010).

    Article  Google Scholar 

  5. Smith, C. A., Want, E. J., O’Maille, G., Abagyan, R. & Siuzdak, G. Anal. Chem. 78, 779–787 (2006).

    CAS  Article  Google Scholar 

  6. Böcker, S., Letzel, M. C., Lipták, Z. & Pervukhin, A. Bioinformatics 25, 218–224 (2009).

    Article  Google Scholar 

  7. Böcker, S. & Rasche, F. Bioinformatics 24, i49–i55 (2008).

    Article  Google Scholar 

  8. Böcker, S. & Dührkop, K. J. Cheminform. 8, 5 (2016).

    Article  Google Scholar 

  9. Dührkop, K., Shen, H., Meusel, M., Rousu, J. & Böcker, S. Proc. Natl Acad. Sci. USA 112, 12580–12585 (2015).

  10. Shen, H., Dührkop, K., Böcker, S. & Rousu, J. Bioinformatics 30, i157–i164 (2014).

    CAS  Article  Google Scholar 

  11. Heinonen, M., Shen, H., Zamboni, N. & Rousu, J. Bioinformatics 28, 2333–2341 (2012).

    CAS  Article  Google Scholar 

  12. Pirhaji, L. et al. Nat. Methods 13, 770–776 (2016).

    CAS  Article  Google Scholar 

  13. Hatzimanikatis, V. et al. Bioinformatics 21, 1603–1609 (2005).

    CAS  Article  Google Scholar 

  14. Meusel, M. et al. Anal. Chem. 88, 7556–7566 (2016).

    CAS  Article  Google Scholar 

  15. Ludwig, M., Dührkop, K. & Böcker, S. Bioinformatics 34, i333–i340 (2018).

    Article  Google Scholar 

  16. Kim, S. et al. Nucleic Acids Res. 44, D1202–D1213 (2016).

    CAS  Article  Google Scholar 

  17. Jeffryes, J. G. et al. J. Cheminform. 7, 44 (2015).

    Article  Google Scholar 

  18. Schymanski, E. L. et al. J. Cheminform. 9, 22 (2017).

    Article  Google Scholar 

  19. Pence, H. E. & Williams, A. J. Chem. Educ. 87, 1123–1124 (2010).

    CAS  Article  Google Scholar 

  20. CASMI 2017 Team. And the results are. CASMI 2017 (2017).

  21. Cohen, L. J. et al. Nature 549, 48–53 (2017).

    CAS  Article  Google Scholar 

  22. Dührkop, K., Ludwig, M., Meusel, M. & Böcker, S. in Algorithms in Bioinformatics (WABI 2013) (eds Darling, A. & Stoye, J.) 45–58 (Springer, Berlin, 2013).

  23. Böcker, S. & Lipták, Zs. Algorithmica 48, 413–432 (2007).

  24. Böcker, S., Letzel, M. C., Lipták, Zs. & Pervukhin, A. in Algorithms in Bioinformatics (WABI 2006) (eds Bücher, P. & Moret, B. M. E.) 12–23 (Springer, Berlin, 2006).

  25. Rauf, I., Rasche, F., Nicolas, F. & Böcker, S. J. Comput. Biol. 20, 311–321 (2013).

    CAS  Article  Google Scholar 

  26. White, W. T. J., Beyer, S., Dührkop, K., Chimani, M. & Böcker, S. in Computing and Combinatorics (COCOON 2015) (eds Xu, D., Du, D. & Du, D.) 310–322 (Springer, Cham, 2015).

  27. Dührkop, K., Lataretu, M. A., White, W. T. J. & Böcker, S. in Proc. 18th International Workshop on Algorithms in Bioinformatics (WABI 2018) (eds Parida, L. & Ukkonen, E.) 23:1–23:14 (Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 2018).

  28. GNU Linear Programming Kit (GLPK) v. 4.60 (Free Software Foundation, 2016).

  29. CPLEX v. 12.8 (IBM, 2017).

  30. Senior, J. Am. J. Math. 73, 663–689 (1951).

    Article  Google Scholar 

  31. Pluskal, T., Uehara, T. & Yanagida, M. Anal. Chem. 84, 4396–4403 (2012).

    CAS  Article  Google Scholar 

  32. Dührkop, K., Hufsky, F. & Böcker, S. Mass Spectrom. (Tokyo) 3, S0037 (2014).

    Article  Google Scholar 

  33. LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).

    CAS  Article  Google Scholar 

  34. Böcker, S. & Mäkinen, V. IEEE/ACM Trans. Comput. Biol. Bioinform. 5, 91–100 (2008).

    Article  Google Scholar 

  35. Cortes, C., Mohri, M. & Rostamizadeh, A. J. Mach. Learn. Res. 13, 795–828 (2012).

    Google Scholar 

  36. Shen, H., Szedmak, S., Brouard, C. & Rousu, J. in Discovery Science (DS 2016) (eds Calders, T., Ceci, M. & Malerba, D.) 427–441 (Springer, Cham, 2016).

  37. Horai, H. et al. J. Mass. Spectrom. 45, 703–714 (2010).

    CAS  Article  Google Scholar 

  38. Brodley, C. E. & Friedl, M. A. J. Artif. Intell. Res. 11, 131–167 (1999).

    Article  Google Scholar 

  39. Rogers, D. & Hahn, M. J. Chem. Inf. Model. 50, 742–754 (2010).

    CAS  Article  Google Scholar 

  40. Willighagen, E. L. et al. J. Cheminform. 9, 33 (2017).

    Article  Google Scholar 

  41. Wang, R., Gao, Y. & Lai, L. Perspect. Drug Discov. Des. 19, 47–66 (2000).

    CAS  Article  Google Scholar 

  42. Steinbeck, C. et al. J. Chem. Inf. Comput. Sci. 43, 493–500 (2003).

    CAS  Article  Google Scholar 

  43. SIRIUS v. 4.0.1 (Friedrich-Schiller-University Jena, 2018).

  44. Melnik, A. et al. Data generation and analysis with SIRIUS 4 on two biological case studies. Protocol Exchange (2019).

Download references


We gratefully acknowledge financial support by the Deutsche Forschungsgemeinschaft (BO 1910/20) to S.B. and the Academy of Finland (310107/MACOME) to J.R.. We thank the GNPS community, S. Stein, F. Kuhlmann, and Agilent Technologies Inc. (Santa Clara, CA, USA) for providing data that were used to estimate the hyperparameters of SIRIUS 4 and to train CSI:FingerID. We also thank F. Kuhlmann and Agilent Technologies for data used to evaluate the isotope scoring.

Author information

Authors and Affiliations



K.D., P.C.D., J.R., and S.B. designed the research. K.D., M.F., M.L., and J.R. developed computational methods. K.D., M.F., M.L., and M.M. implemented computational methods and performed method evaluations, coordinated by S.B. A.A.A. and A.V.M. performed the biological case studies, coordinated by P.C.D. S.B. wrote the manuscript, to which K.D., M.F., M.L., A.A.A., and A.V.M. contributed, in cooperation with all other authors.

Corresponding author

Correspondence to Sebastian Böcker.

Ethics declarations

Competing interests

S.B. holds patents (Japanese patent 5559816 and US patent 8263931) whose value might be affected by this publication.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 SIRIUS 4 graphical user interface.

a, The molecular formula ‘SIRIUS Overview’ tab displays all molecular formula candidates of some query compound in a single display. b, The ‘Spectra view’ tab shows the individual mass spectra. c, Similarly, the ‘Tree view’ tab allows a closer look at the fragmentation trees, for each molecular formula candidate. d,e, The next two tabs shows the result of the CSI:FingerID molecular structure search: (d) the ‘CSI:FingerID Overview’ tab summarizes all molecular formula candidates, whereas (e) the ‘CSI:FingerID Details’ tab presents results for each molecular formula candidate individually. Results can be filtered and searched; database links are provided for candidates when possible. f, Finally, the ‘Predicted Fingerprint’ tab presents the fingerprint predicted by CSI:FingerID, independently of any database searching.

Supplementary Figure 2 Job View in the SIRIUS 4 graphical user interface.

The job view displays name, type, state (running, queued, waiting and failed) and progress of SIRIUS 4 jobs in the job-scheduling system. Jobs can be canceled individually, and the job scheduler automatically handles potential dependencies. Logging information can be shown individually for each job.

Supplementary Figure 3 Example of SIRIUS 4 isotope pattern analysis for MS/MS data.

a, MSE spectrum for CASMI 2017 challenge 226, a derivative of cyclochlorotine with sum formula C24H31Cl2N5O7. b, A single isotope pattern in the spectrum is highlighted and shown in detail. The simulated isotope pattern of C23H28Cl2NO7 is drawn below in red. c, Part of the fragmentation graph corresponding to this spectrum. Yellow nodes correspond to the isotope peaks of C23H28Cl2NO7. We see that the first isotope peak of C23H28Cl2NO7 can also be explained as the monoisotopic peak of C24H24ClN3O7

Supplementary Figure 4 Support Vector Machine for classifying molecular formulas of biomolecules.

a, Histogram and kernel density plots of the linear Support Vector Machine (SVM); plotted are molecular formulas from the biomolecule structure database (Supplementary Table 2), PubChem, and a random subset of decompositions. b, Receiver operating characteristic (ROC) plot of the classifier, biomolecules versus random decompositions. The area under the curve (AUC) of ROC is 0.965.

Supplementary Figure 5 Predicted fingerprint of the query clobutinol.

Only molecular properties with at least four heavy atoms are displayed. Different from Fig. 1b–f, a second molecular property predicted to be present (green bars) has been selected; again, SIRIUS 4 displays a few example structures that contain the corresponding property. The two substructures (Fig. 1f and here) allow the user to deduce information about the query structure, without the need to query a molecular structure database.

Supplementary Figure 6 Comparing SIRIUS 3.0 and current SIRIUS 4.

Both versions are compared using isotope pattern (MS1) and MS/MS data from 3,965 compounds with mass ranging from 75 Da to 1,289 Da. a, Evaluation of molecular formula annotation. We report the number of instances where the correct molecular formula was ranked in the top k, for k = 1, …, 10. We evaluate exclusively the isotope pattern scoring of SIRIUS 3.0 (green diamonds) and SIRIUS 4 (blue diamonds), as well as the combined analysis of isotope patterns and MS/MS data (blue and green circles). b, Running time comparison, combined analysis. We sort compounds by mass, and report the time SIRIUS 3.0 and SIRIUS 4 require for computing the k% lightest compounds in the dataset. Note the logarithmic y-scale. SIRIUS 3.0 stopped after 154 h of computation with a timeout/memory exception, failing to compute the 90 heaviest compounds.

Source data

Supplementary Figure 7 Methods evaluation, structural elucidation searching GNPS in PubChem.

The percentage of correctly identified structures found in the top k output of a method. Searching N = 3,868 compounds from GNPS in PubChem (15 September 2014). CSI:FingerID 1.1 is evaluated here; identification rates for all other methods are taken from ref. 43 in the Supplementary Information reference list.

Source data

Supplementary Figure 8 Methods evaluation, contribution of method enhancements and new data.

Cumulative contribution of different aspects from version 1.0 to 1.1 of CSI:FingerID. New kernels (green) add ECFP fingerprints (red) and additional training data (violet).

Source data

Supplementary Figure 9 Identification of leukotrienes on human skin: network cluster.

A large network cluster indicating a number of structurally related compounds, with no compounds annotated via library search throughout the entire cluster.

Supplementary Figure 10 Identification of leukotrienes on human skin: fragmentation tree.

Fragmentation spectra and tree that explains the experimentally observed MS/MS fragmentation pattern of the ion with m/z 440.246.

Supplementary Figure 11 Identification of leukotrienes on human skin: compound structure.

Structure of the compound with the highest score, 14,15-leukotriene E4 (LTE4). This structure served as a starting point for annotation of other compounds in the cluster.

Supplementary Figure 12 Identification of leukotrienes on human skin: spectral validation.

Validation of LTE4 predicted by SIRIUS 4 with CSI:FingerID by spectral and retention time match with synthetic standard.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–12, Supplementary Tables 1 and 2, Supplementary Notes 1–10 and Supplementary Results 1–6

Reporting Summary

Supplementary Protocol

Data generation and analysis with SIRIUS 4 on two biological case studies.

Supplementary Table 3

Molecular formula identification with SIRIUS 3 and SIRIUS 4 using solely MS1 data or MS1 and MS/MS data.

Supplementary Table 4

CASMI 2016 data reanalysis with SIRIUS 4.

Supplementary Table 5

Molecular structure identification rates using CSI:FingerID version 1.0 versus 1.1.

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dührkop, K., Fleischauer, M., Ludwig, M. et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods 16, 299–302 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing