Quantification for non-targeted LC/MS screening without standard substances

Non-targeted and suspect analyses with liquid chromatography/electrospray/high-resolution mass spectrometry (LC/ESI/HRMS) are gaining importance as they enable identification of hundreds or even thousands of compounds in a single sample. Here, we present an approach to address the challenge to quantify compounds identified from LC/HRMS data without authentic standards. The approach uses random forest regression to predict the response of the compounds in ESI/HRMS with a mean error of 2.2 and 2.0 times for ESI positive and negative mode, respectively. We observe that the predicted responses can be transferred between different instruments via a regression approach. Furthermore, we applied the predicted responses to estimate the concentration of the compounds without the standard substances. The approach was validated by quantifying pesticides and mycotoxins in six different cereal samples. For applicability, the accuracy of the concentration prediction needs to be compatible with the effect (e.g. toxicology) predictions. We achieved the average quantification error of 5.4 times, which is well compatible with the accuracy of the toxicology predictions.

logIE values collected from previous studies and measured in this study. 9 Table S3 The eluent compositions used for model development in ESI positive mode.
10 Table S4 The eluent compositions used for model development in ESI negative mode.
15 Table S5 Classifications of studied compounds using ClassyFire.
17 Table S6 The most prominent superclasses covered by the compounds included in this study.
Classifications of studied compounds using ClassyFire. 17 Table S7 Compounds used in validation and application study.
18  Table S14Comparison of errors in concentration prediction by compound in case of pesticides and mycotoxins in cereal matrices using three different approaches. 28 Table S15Comparison of errors in concentration prediction by matrix in case of pesticides and mycotoxins in cereal matrices using ionization efficiency prediction. 30 Table S16 Properties of pesticides studied in application study. 31 Table S17 Raw data collected in application study in example of pesticides and mycotoxins in cereals. 31 Table S18 Summary of data used for model development and concentration prediction. 32 Figure S1 A comparison of the chemical space coverage based on logP values. 33 Figure S2 A comparison of the chemical space covered by compounds included in this study (training, test, and validation sets) in comparison to (left) NORMAN, (middle) HMDB, and (right) DrugBank databases. 34 Figure S3 The PCA analysis of the compounds from training, test and validation set based on the PaDEL descriptors. 35 Figure S4 PCA analysis of the training dataset compounds(n = 353). 36 Figure S5 Comparison of prediction errors ionization efficiencies between acetonitrile and methanol containing solvents in ESI positive mode. 38 Figure S6 Comparison of the prediction error of ionization efficiencies between neat acetonitrile and methanol in ESI positive mode. 39 Figure S7 Comparison of the prediction error of ionization efficiencies in acetonitrile containing solvents in ESI negative mode. 40 Figure S8 Comparison of the prediction error of ionization efficiency between different solvents in ESI negative mode. 41 Figure S9 Correlation between logIE values measured for set of compounds in the same eluent composition on different mass spectrometric setups. 42 Figure S10 Comparison of the prediction error of ionization efficiency values for different instruments (Table S7) in ESI positive mode. 44 Figure S11 The predicted ionization efficiency values for the validation compounds relative to the measured values. 45 Figure S12 Comparison of predicted and spiked concentration in case of pesticides in cereal samples. 46 Figure S13 Comparison of the prediction error of ionization efficiencies for different ionization efficiency groups in ESI positive mode. 47 Figure S14 Comparison of the prediction error of ionization efficiencies for different ionization efficiency groups in ESI negative mode. 48 Code S1 Code used for model development. 489

Eluent parameters
The viscosity of organic modifier water binary mixture [1][2][3] was calculated using the general model: The surface tension of organic modifier water binary mixture 4,5 was calculated using the general model: Eq. 2 Polarity index of organic modifier water binary mixture 6 was calculated using the general model:           57 1 AVP-0 Weighted path 58 1 WTPT-2 Topological charge 1 2 GGI3, GGI4 Topological distance matrix 50 2 VE1_D, VE2_D Wiener numbers 59 1 WPATH 25    The principal component analysis was conducted based on the PaDEL descriptor, and shown are the scores plots from the first two principal components. In all cases first and second principal component explain relatively small part of the total variance (21 to 29%). This is expected, as the chemical space has high dimensionality and includes compounds from very different classes.

Figure S3
The PCA analysis of the compounds from training, test and validation set based on the PaDEL descriptors. First two principal component explain ca 30% of the total variance. Each dot represents one compound. Figure S4 PCA analysis of the training dataset compounds(n = 353). Violet dots denote the compounds used for studying the solvents. For choosing the set to study solvents, 18 first principal components (described variance 70.7%) were used. For clairty the first three principal components are presented. Each dot represesnts one compound.

Boxplots
For the boxplots the lower line presents the 1 st quartile, the line in the middle the 2 nd quartile (median) and the higher line the 3 rd quartile. Whiskers are found according to the formula: ℎ = min(max( ) , 3 + 1.5 • ) ℎ = max(min( ), 1 − 1.5 • )

Figure S5
Comparison of prediction errors ionization efficiencies between acetonitrile and methanol containing solvents in ESI positive mode. Results are divided into groups by water phase pH and organic modifier content. Comparison based on intersection of compounds measured in methanol as well as in acetonitrile. The compared results are measured on one instrument. Each datapoint corresponds to one compound solvent combination. Dots represent outliers.

Figure S6
Comparison of the prediction error of ionization efficiencies between neat acetonitrile and methanol in ESI positive mode. Divided into groups by pH adjusting additive type.
Comparison is based on intersection of compounds measured in methanol as well as in acetonitrile. The compared results are measured on one instrument. Every datapoint corresponds to one compound. Dots represent outliers.

Figure S7
Comparison of the prediction error of ionization efficiencies in acetonitrile containing solvents in ESI negative mode. Results are divided into groups by water phase pH and organic modifier content. Comparison based on intersection of compounds measured in all pH groups. The compared results are measured on one instrument. Every datapoint corresponds to one compound. Dots represent outliers.

Figure S8
Comparison of the prediction error of ionization efficiency between different solvents in ESI negative mode. Compared for the intersection of compounds in studied eluent compositions. Every datapoint corresponds to one compound solvent composition combination. Dots represent outliers.

Figure S9
Correlation between logIE values measured for set of compounds in the same eluent composition on different mass spectrometric setups. The intersection of compound-solvent cominations studied with Agilent XCT and Waters Synapt G2 is too few to study the correlations. Each dot represents one compound-solvent pair.
a.  (Table S7) in ESI positive mode. Every datapoint corresponds to one compound solvent composition combination. Dots represent outliers.

Figure S11
The predicted ionization efficiency values for the validation compounds relative to the measured values. The validation set included 35 pesticides and mycotoxines. Every datapoint corresponds to one compound solvent composition combination. All measurements have been done on Agilent 6495 triple quadrupole instrument with Jet Streem ionization source.

Figure S12
Comparison of predicted and spiked concentration in case of pesticides in cereal samples.

Figure S13
Comparison of the prediction error of ionization efficiencies for different ionization efficiency groups in ESI positive mode. Every datapoint corresponds to one compound-solvent combination. Dots represent outliers.

Figure S14
Comparison of the prediction error of ionization efficiencies for different ionization efficiency groups in ESI negative mode. Every datapoint corresponds to one compound-solvent combination. Dots represent outliers.