Extended Data Fig. 4: Molecular formula annotation error rates. | Nature Machine Intelligence

Extended Data Fig. 4: Molecular formula annotation error rates.

From: Database-independent molecular formula annotation using Gibbs sampling through ZODIAC

Extended Data Fig. 4

Error rates on five datasets. Methods are SIRIUS; ZODIAC (without anchors); exact mass over elements carbon, hydrogen, nitrogen and oxygen (‘exact mass (CHNO)’); exact mass over CHNO plus phosphorus and sulfur (‘exact mass (CHNOPS)’); Seven Golden Rules with elements CHNOPS (‘7GR (CHNOPS)’); Seven Golden Rules with elements CHNOPS plus bromine and chlorine (‘7GR (CHNOPSBrCl)’); and GenForm. Between 44 an 271 compounds were processed per dataset, see Extended Data Fig. 1 for details. GenForm is the only publicly available tool for molecular formula inference besides SIRIUS, and considers both the isotope pattern and the fragmentation spectrum. GenForm was restricted to elements CHNOPS, and 7GR (CHNOPSBrCl) cannot annotate iodine-containing compounds; to this end, only SIRIUS and ZODIAC are in theory capable of annotating the two novel molecular formulas C24H47BrNO8P and C15H30ClIO5 reported here. Error rates are based on all compounds with established ground truth, resulting in slightly higher error rates for SIRIUS and ZODIAC on dendroides, tomato and mice stool compared to Fig. 1. Error rates on the five datasets agree well with the mass of compounds in the respective dataset, see Extended Data Fig. 1: larger compounds result in substantially more candidates to be considered, in particular for a larger set of elements, and result in worse annotation rates. For evaluation details see the Methods section.

Source data

Back to article page