DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics

The extraction of meaningful biological knowledge from high-throughput mass spectrometry data relies on limiting false discoveries to a manageable amount. For targeted approaches in metabolomics a main challenge is the detection of false positive metabolic features in the low signal-to-noise ranges of data-independent acquisition results and their filtering. Another factor is that the creation of assay libraries for data-independent acquisition analysis and the processing of extracted ion chromatograms have not been automated in metabolomics. Here we present a fully automated open-source workflow for high-throughput metabolomics that combines data-dependent and data-independent acquisition for library generation, analysis, and statistical validation, with rigorous control of the false-discovery rate while matching manual analysis regarding quantification accuracy. Using an experimentally specific data-dependent acquisition library based on reference substances allows for accurate identification of compounds and markers from data-independent acquisition data in low concentrations, facilitating biomarker quantification.


Targeted extraction, scoring and FDR estimation
The false discovery rate (FDR) was introduced, handling the results of the increasing amounts of genomics data. It is the expected ratio of false-positive classifications (false discoveries) to the total number of positive classifications.

FDR = FP / (FP+TP)
In this context, "discovery" is a statistical term rooted in the context of positives and negatives. In our context, the "positives" would be true quantification events (e.g. a signal in the XIC that corresponds to the analyte of interest) while the "negatives" would be false quantification events (e.g. a signal / region in the XIC that does not correspond to the analyte of interest and is either chemical noise or belongs to another analyte). In original terminology as introduced in 1995, the "discovery" stands for the items that are labeled as "positive", and hence could be true positives or false positives, in a gene expression experiment as the genes that you label as differentially expressed 1 . In 2007, Elias et al. introduced the concept of target-decoy FDR in mass spectrometry-based proteomics 2 , where it is used to distinguish correct from incorrect peptide identifications. Unfortunately, creating plausible decoys in metabolomics is not as straightforward since methods like shuffling or reversing a sequence do not work in metabolomics. However, targetdecoy based FDR estimation was introduced recently for large scale untargeted metabolomics studies 3,4 . Scheubert et al. presented methods to ensure the consistency of the decoy spectra by using fragmentation tree re-rooting 3 . First, a fragmentation tree is constructed for the original spectrum identification assigning fragments compatible with the metabolite substructures. In a second step, this fragmentation tree will be re-rooted. A new root is chosen, leading to tree rearrangements by shifting the fragmentation reaction order. The results are new potential fragments based on the original metabolite. The generated decoys should have a similar probability of occurring in the sample and could represent the same metabolite -a decoy. The FDR for hits in a spectral library database can be estimated with this method. In the targeted setting, the peak group FDR estimation works differently. Experimental specific targets and decoys are added to the assay library (prior knowledge database) used for targeted extraction. For available targets and decoys, peak groups are extracted and scored ( Supplementary Fig. 1). A discriminant score (d-score) distribution is computed using a linear discriminant analysis based on the available subscores, and statistical error estimates are derived by fitting a null distribution [5][6][7] (Supplementary Fig. 2). In terms of FDR control, the importance or span of the FDR depends on the research question. For example, a low number of false positives in clinical biomarker identification is important (1% -5% FDR). On the other hand, if the aim is to find new potential biomarkers, an FDR of 10% might still be valid since these must be further validated.

PyProphet model performance
Performance of the PyProphet 6,7 linear discriminant analysis for metabolomics data. For MS1 and MS2 scoring scores, with a low cross-correlation, were used. Using semi-supervised learning, a combined discriminant score was established. The group discriminant-score (dscore) distribution and density show the decoy and target population results ( Supplementary  Fig. 2a,b). Decoy and false targets are nicely separated from the true targets, based on the dscore scoring. The p-value density histogram shows an anti-conservative p-value distribution estimated based on the Storey method 8 .
Supplementary Figure 2 | PyProphet target decoy evaluation. a,b) The group discriminant-score (d-score) distribution and density show the decoy and target population results. Decoy and false targets are nicely separated from the real targets, based on the d-score scoring. c) The p-value density histogram showing anti-conservative p-value distribution estimated based on the Storey method.

Combining identification information between collision energies
Depending on the molecular mass and structure of the metabolites the fragmentation behavior may differ, depending on the collision energy. Multiple collision energies or collision energy ranges may boost library performance 9 . Here, combining the two collision energy ranges (20-50 eV & 50-80 eV) can boost assay library coverage by 11% ( Supplementary Fig. 3).

Supplementary Figure 3 | Library coverage.
Identification at MS1 level was performed using AccurateMassSearch (AMS). The library with fragment annotation at MS2 level and 3 transitions (T3) was generated using the AssayGeneratorMetabo (AGM). a) Library generation using 20-50 eV data has a coverage of 66% detected and identified compounds. b) Combining detected and identified compounds from both collision energy ranges (20-50 eV, 50-80 eV) increases the coverage of the target-decoy library to 77%.

Optimizing for unique identifications
To investigate the effects of assay redundancy and specificity within our assay library, we first used computational models to calculate nonredundant theoretical assays, also known as unique ion signatures (UIS), for a given metabolomic background [10][11][12] . UISn is defined as a set of top n mass-to-charge values (precursor and fragment m/z) that maps exclusively to one metabolite in the metabolome to be analyzed 22 . For this analysis, we simulated classic MS methods (MS1, MRM, SWATH) using the assay library and NIST 17 LC/MS as a combined background with varying mass accuracy for both the precursor m/z window (MS1) and the fragment m/z window (MS2) to compare differences between UIS1, UIS2 and UIS3. This analysis was performed for data acquired in both low (20-50 eV) and high (50-80 eV) ranges of collision energy. This analysis demonstrates that SWATH outperforms MS1-only analyses, while performing comparably to MRM-based analyses when using high accuracy fragment ion information only (33.2%, 56.1% and 62.8% unambiguous compounds respectively for MS1-only, SWATH Variable Windows/25 ppm and MRM UIS3, Supplementary Fig. 4a). However, with increased MS1 accuracy, SWATH outperforms MRM-based analyses (with a few transitions) due to its capability of extracting both high-resolution MS1 precursor ion traces as well as highresolution fragment ion traces for any analyte of interest (93.4% unambiguous compounds for SWATH 25pm/25ppm UIS3, simulating both MS1 and MS2 XIC analysis, Supplementary Fig.  4a, rightmost bars). Based on these overall results and our previous study 13 , scoring is based on both MS1 and MS2 information, in addition to retention time and fragment ion relative intensity .

Supplementary Figure 4 | Percentage of Unique Compounds for different MS methods by comparison of UIS1, UIS2
and UIS3. Using the assay library and the NIST 17 LC/MS database as background, simulations were conducted for each analyte (query) by setting different mass tolerances associated with different MS methods. The following MS1/MS2 windows were set: MS1-only: 25ppm/-; Multiple Reaction Monitoring (MRM): 0.7 Da/0.7 Da; and Data-Independent Acquisition (DIA/SWATH): Variable Windows/25 ppm, 25ppm/25ppm. The percentage of compounds in the assay library with no interference (the background for each query based on the given parameters), known as the percentage of unique compounds (y-axis), is calculated for each method (x-axis) a) for data acquired at collision energy ranges of 20-50 eV b) and 50-80 eV.

Variable SWATH windows
The variable SWATH windows were assessed based on the plasma matrix using the SWATH Variable Window Calculator (SCIEX). The target number of windows was set to 8, lower and upper m/z limit to 100 and 900, respectively. "Round bin edges to x figures" was set to 1 and a window overlap of 1 Da was chosen. In addition, the minimum window was set to 3 and CES to 5. With these settings, the windows in Supplementary Table 1

Dilution series
The concentration of the 250 pesticides over the 1:4 dilution series (Supplementary Table 2). The concentration of the metabolite is dependent on its molecular mass, which ranges from 141 to 874 g/mol. Using an injection volume of 5 µL the concentration would range from the highest concentration (step1 -min: 5717.501 fmol/µL, max 35460.666 fmol/µL) to the lowest concentration (step 10 -min: 0.022fmol/µL, max 0.135 fmol/µL) covering 5 orders of magnitude ( Supplementary Fig. 5).
Supplementary Figure 5 | Concentration of the pesticides over the dilution series. The concentration gradient over the dilutions series covers 5 orders of magnitude and depends on the molecular mass of the pesticide. Covering minimum 5717.501 fmol/µL (step 1) to 0.022fmol/µL and maximum 35460.666 fmol/µL to 0.135 fmol/µL (step 10) over the 1:4 dilution series.

FDR calibration at different collision energies
The FDR based on the fragmentation tree re-rooting approach has a conservative tendency, with a slight overestimation for data acquired at lower ranges of collision energy (20-50 eV). In comparison, FDR estimates for data acquired at higher ranges of collision energy (50-80 eV) demonstrated an increased abundance of overlapping fragments resulting in more complicated analyses ( Supplementary Fig. 6). We determined the precision and recall based on different estimated FDR thresholds using the best peak group rank, to verify the accuracy of the classifier. The area-under-the-curve is 96% and 93%, respectively.

Quantification behavior of detected metabolites
Three different quantification behaviors were determined, illustrated by the following metabolites ( Supplementary Fig. 7). Amidosulfuron (exact mass: 369.04 Da) has a low initial intensity and was detected in the first two dilutions. Azaconazole (exact mass: 229.02 Da), which had a higher measurable initial intensity was detected to a dilution of 1:1024. Fenpyroximate (exact mass: 421.20 Da), with the highest initial intensity, was detected to a dilution of 1:262144. Depending on the initial measurable intensity, the pesticides could be detected in samples with a higher dilution.

Supplementary Figure 7 | Quantification behaviors over the dilution series for different pesticides. a) Amidosulfuron
(exact mass: 369.04 Da) has a low initial intensity and was detected in the first two dilutions. b) Azaconazole (exact mass: 229.02 Da) had a higher initial measurable intensity and was detected to a dilution of 1:1024. c) Fenpyroximate (exact mass: 421.20 Da) was detected to a dilution of 1:262144 and had the highest initial intensity. a-c) indicate median, 25th and 75th percentiles (middle line, Q1 and Q3 within the box respectively), including 1.5x interquartile range whiskers. Visualization of outliers were disabled to demonstrate the overlay with a dot plot to depict the underlying distribution (n per dilution = 3 technical replicates -benchmark dataset).

Limit of detection
We calculated the LOD, using the unfiltered results and the definition of (LOD = S/N > 10) for each pesticide (Supplementary Table 3). We used the intensity at the lowest dilution the compound was still detected via targeted extraction as noise. From there we calculated the concentration based on the exact mass, the initial concentration of 1 ng/µl and the corresponding dilution of the compound. Please see lod.RMD at https://github.com/oliveralka/DIAMetAlyzer_additional_code for details. Quantification behavior at higher collision energy ranges

Supplementary
All results for pesticides measured at a collision energy range from 50 to 80 eV were filtered using a 5% FDR threshold. At half of the dilution series (1:1,024) 58 metabolites were detected ( Supplementary Fig. 8a). The difference in mean standard deviation regarding the theoretical concentration of the automatic and manual analysis is shown in Supplementary Fig. 8b. The median coefficient of variation (CV) of quantified signals was below 20% in all technical replicates ( Supplementary Fig. 8c).

Comparison of identification performance between DIAMetAlyzer and MS-DIAL
We performed a comparison with MS-DIAL (v. 4.60) based on our ground truth pesticide mix dataset. Here, our constructed assay library was used for the identification of deconvoluted spectra in MS-DIAL. First, the assay library was converted to a spectral library (.msp) using an in house script (https://github.com/oliveralka/MetaboAssayLibToMSPConversion Then the results were compared to the ground truth dataset and DIAMetAlyzer results at 5% FDR (comparison_ground_truth_pyprophet_ms-dial.py). The comparison was visualized using an R script (vis_comp_MS-DIAL.Rmd). DIAMetAlyzer was able to identify 156 true positives and 3 false positive compounds in comparison to the ground truth. MS-DIAL with an identification threshold of 0.8 and retention time scoring enabled, was able to identify 84 true positives, 5 false positives and was not able to identify 70 compounds (false negatives) ( Supplementary Fig. 9). Here, we could show the advantage of DIAMetAylzer targeted extraction strategy with false-discovery rate control based on reference compounds in comparison to non-targeted deconvolution. We would like to state that the functionality of MS-DIAL is focused on the identification of unknown compounds and is dependent on the spectral library space given to the software. Additional methods for the comparison with MetaboDIA using the AMD dataset Raw data of MTBLS417 (https://www.ebi.ac.uk/metabolights/MTBLS417) was provided by the authors and data was processed as previously described 14 . DDA data was centroided using the qtofpeakpicker and msconvert for conversion into mzML/mzXML. DIA data conversion was performed using msconvert. MetaboDIA (Version 1.3) processes DDA and DIA data in centroided mode (mzXML), in contrast to DIAMetAlyzer which uses centroided DDA (mzML) and profile DIA (mzML) data. An accurate mass search database consisting of HMDB (4.0) and LIPIDMAPS (092020) was constructed, with the entries from the LIPIDMAPS structure file converted into a readable format (Lipid_Enrichtment_prepare_AMS.knwf).
Afterwards, duplicates were filtered from the database (reoderDuplicatesMapping.py), and the HMDB and LIPIDMAPS entries were combined (AppendIdentifiersFromOtherDB.py). For MetaboDIA both databases were combined in a Database.txt file. MetaboDIA was used for the analysis and in case of DIAMetAlyzer, an OpenMS development version (14f627e) was used with support for SIRIUS 4.5, allowing internal decoy generation and feature linking for targeted and untargeted experiments using the AssayGeneratorMetabo library generation. For MetaboDIA, the DIA data needed to be processed with DIA-Umpire 15 (Version 2.1.6) (diaumpire_slurm_mtbls417.sh, params_diaumpire_mtbls417.se_params). DDA and DIA data were processed with XCMS 16 and CAMERA 17 for feature detection and identification. The DDA/DIA workflow from MetaboDIA was run, by specifying the related files, database and adducts (metaboDIA_run.Rmd). Feature detection, adduct grouping and precursor correction for DIAMetAlyzer was performed using KNIME (20200922_processing_MetaboDIA_FFM_MAD_HRPMC.knwf) followed by accurate mass search (AccurateMassSearch.sh). To decrease the runtime of data processing, the assay generation and targeted extraction step was performed on a cluster infrastructure. The library was constructed by either using prior MS1 identification (agm_mt1_02.sh) or with the addition of non-identified features (agm_mt1_unknown_02.sh). The spectral library stemming from MetaboDIA was converted into an assay library (convertSpetralLibrarytoAssayLibrary_1.4.py) and decoys were generated using the DecoyGeneratorMetabo fragdb method (generateMultiDecoys.sh). A combined library was constructed using the DIAMetAlyzer and the converted MetaboDIA library (construct_combined_library.ipynb). All libraries were used for targeted extraction similarly (osw_mt1_67_02.sh). Furthermore, statistical validation was performed using PyProphet (merge_score_export_pyprophet.sh). Results were then processed by DIAlignR 18 , which was extended to allow for metabolomics data processing, to reassess values based on chromatogram retention time alignment (DIAlignR.Rmd). In the following, features with a FDR below 0.05 and peak group level 1 were used for post-processing analysis, which includes library comparison based on the molecular formula, adduct and retention time of a feature (comparison_lib.ipynb). Data was quantitatively compared based on molecular formula and adduct (compare_lib_and_quant.ipynb), with a further comparison of the quantification matrix between features found in MetaboDIA and DIAMetAlyzer to assess the difference of nontargeted vs targeted extraction/quantification (compare_analysis.ipynb). Assessment of differences between the groups was performed using limma with Benjamini & Hochberg correction for multiple testing (quant_analysis.Rmd).

Combining assay libraries
The created AssayGeneratorMetabo Node in our DIAMetAlyzer pipeline provides a robust, reproducible, automated method to build the target-decoy assay library in the OpenMS ecosystem using the fragmentation tree re-rooting method 3 . The combination of spectral/assay library information of other tools is not straightforward due to interoperability issues between the tools and their used data format.
In most cases individual solutions need to be provided/developed to convert the original format to the OpenSWATH assay library format. In the case of MetaboDIA we provide a script for the conversion (convertSpetralLibrarytoAssayLibrary_1.4.py). The fragmentation re-rooting method (SIRIUS) cannot be performed since the MS1 and MS2 spectra information needed is not available on assay library level. For that purpose, we investigated decoy generation methods on the assay library level, which are easier to use and still perform reasonably well. Here, we provide the DecoyGeneratorMetaboTool (https://github.com/oliveralka/DIAMetAlyzer_additional_code/blob/master/additional_tests_ decoy_methods/TOOL/DecoyGeneratorMetaboTool_2.0.py) 19 . For further details regarding the decoy methods on library level please see Supplementary Fig. 18, 19. The so generated target and decoy assay library can then be appended to the one from the AssayGeneratorMetabo node. The combined library can be used in the DIAMetAlyzer workflow for the DIA data analysis.

Library comparison (MetaboDIA vs DIAMetAlyzer)
We compared the different libraries and their entries based on molecular formula, adduct and retention time information (Supplementary Fig. 10). MetaboDIA produces the smallest library, followed by the native DIAMetAlyzer. Combining both libraries leads to an increase in the number of features and stays on the conservative side. Allowing features without MS1 identification in addition to the native DIAMetAlyzer library leads to a further increase of features in the library. Here, SIRIUS is used for the assessment of the molecular formula and the fragment annotation internally.

The difference in DDA feature detection and linking of MetaboDIA and DIAMetAlyzer
We are using the DDA data to generate our targeted assay library. Two important steps in this process are feature detection and feature linking. We used the OpenMS feature detection algorithm (FeatureFinderMetabo) developed by Kenar et al., 2014 20 . A short description of the differences of the feature detection algorithms (XCMS 16,17 , MzMine 21 , Maven 22 ) can be found in the publication. In short, FeatureFinderMetabo was compared to XCMS using a simulated ground truth dataset. Here, a similar feature overlap of around 66% was reported, as well as, a higher recall, precision and F-score for the FeatureFinderMetabo. For the feature detection and linking comparison of MetaboDIA and DIAMetAlyzer, we used detected features with a minimum of one mass trace (monoisotopic peak) and linked in at least 50% of the samples. Based on the assay library comparison, DIAMetAlyzer and MetaboDIA overlap in 113 features, whereas MetaboDIA (XCMS/CAMERA) and DIAMetAlyzer (FeatureFinderMetabo) found 39 and 158 exclusive features respectively. To compare the feature detection and linking based on the library comparison, overlapping and nonoverlapping features were exported to an excel sheet, manually converted to edta and automatically converted to featureXML. The featureXML was then used for visualization in OpenMS TOPPView. Two DDA files were chosen randomly (one of each sample group: PH697172_pos_IDA-PH697172; CS56422_pos_IDA-CS56422) and the features were manually inspected. Comparing the feature detection and feature linking results of the nonoverlapping features in more detail, only 8 of the 39 features in the MetaboDIA library have not been detected by DIAMetAlyzer (FeatureFinderMetabo). The other 31 features are filtered in the assay library generation, discussed in more detail below. In terms of feature detection XCMS/CAMERA, were in some cases not able to separate features ( Supplementary Fig 11a), but were able to better detect features with a tailing mass trace ( Supplementary Fig 11b) or features that start with a maximum (e.g. at the beginning of the retention time gradient) ( Supplementary Fig. 11c,d). The DIAMetAlyzer (FeatureFinderMetabo) picks up low intensity features more consistently (Supplementary Fig. 12a-c). In addition, the manual inspection suggests that the FeatureFinderMetabo was able to detect and split multiple features "hidden" in longer mass traces (Supplementary Fig. 12d.) In general, we think a more refined parameter optimization of both algorithms would still be able to detect additional features in this dataset. The presented features in Figures 11 and 12 do not represent the main population of features detected in the analysis. In case of Figure 11 c,d, the analytes are eluting in the beginning of the gradient and are somewhat cut-off but are still features which can be detected with the XCMS algorithms. In the case of Figure 12 a-c, the feature intensity span in the file was roughly from 7,523 to 72,903,272. All features shown have a maximum intensity of 75,000 and can be deemed low intensity. The features presented in Figures 11 and 12 do not show an optimal chromatographic shape with around 6-8 peaks. Supplementary Fig. 13 shows the chromatograms two mid-range intensity features and one high intensity feature as representatives of the main feature population in the dataset. Please be aware that due to different axis ranges in intensity and retention time, the figures are not directly comparable, but should give an indication of the feature population in the sample. intensity features with an intensity of around 2,000,000 were detected in both algorithms. c) High intensity feature with an intensity of 7,000,000. All three features were extracted from DDA data and are shown as representation of the chromatographic outline of the main feature population in the dataset.
A big difference between the tools is introduced in the feature linking/assay library generation step. MetaboDIA detects and links features independent of their identification. In DIAMetAlyzer the feature linking is performed after fragment annotation. Here, features with less than four fragment peaks in the MS2 spectra are filtered out by SIRIUS. Further filtering depends on the parameterisation of the workflow. Using solely known compounds, features without identification or without a correct fragmentation annotation are filtered out before the linking process. This leads to the filtering of linked features based on occurrence of their identification. For example, if a feature was linked in 67 samples, but was only correctly identified in 13 samples, the feature in case of the DIAMetAlyzer is filtered, since it is not available with a valid identification in 50% of the samples. This step is performed to ensure a high-quality assay library based on identification and fragment annotation. In conclusion, the feature detection in DIAMetAlyzer can detect more features, but the linking is more stringent, dependent on the application, to ensure a high-quality assay library. One of the strong points of DIAMetAlyzer is that it allows for the aggregation of information in the assay library e.g. based on the findings of other tools or spectral libraries. In terms of stability and reproducibility, since feature detection, adduct grouping and identification are distinct steps to construct the assay library, a change in parameters may lead to differences in the results. It is advisable to optimize the feature detection parameters to the specific dataset, to improve the feature coverage. The changing of parameters should not impair the general robustness of the workflow. In terms of reproducibility, KNIME allows to save and export the workflow with the adjusted parameters to fit the experimental setting. We would advise supplying the specific workflow or script which was used for the analysis. This will allow for reproducibility independent of the parameter changes.

Targeted vs non-targeted quantification (MetaboDIA vs DIAMetAlyzer)
Based on the set of molecular formulas and adducts, MetaboDIA was able to quantify 54 features in comparison to DIAMetAlyzer, which was able to quantify 440 features using the same library (MetaboDIA) (Supplementary Fig. 14a). This discrepancy was discussed with the developers of MetaboDIA. They stated that MetaboDIA is not in active development anymore. Due to its dependence on XMCS, CAMERA and DIA-Umpire, changes in one of these software suites may lead to such performance issues. Based on the quantified overlapping features the targeted extraction method used in DIAMetAlyzer has a clear advantage over the method used by MetaboDIA and is in most cases able to quantify a compound in every sample ( Supplementary Fig. 14b).
Supplementary Figure 14 | Quantification comparison between MetaboDIA and DIAMetAlyzer. a) Based on the set of molecular formula and adduct MetaboDIA was able to quantify 54 features in comparison DIAMetAlyzer was able to quantify 440 features using the same library (MetaboDIA) b) Shows that most features, which were quantified by both tools, have quantitative values in 50-60 samples when DIAMetAlyer is used. In contrast, a higher variation in this frequency is seen for MetaboDIA.

Patient group comparison of the AMD data
DIAMetAlyzer + Unknown analysis was used to perform a principal component analysis to assess the class separation based on the determined features. Using all available features ( Supplementary Fig. 15) or features that showed a significant difference between groups (5% FDR). In both cases the control shows a distinct group but cannot be separated by PC2. The separation in PC1 direction leads to the conclusion of a batch effect since a few samples from each group (CNV, PCV) are separated by PC1. This occurrence does not arise from the extraction batch or injection order. Unfortunately, not enough metadata is present to explain this effect.

Differential expression analysis of AMD data
Differential expression analysis was performed with limma to assess the differences between the control, CNV and PCV groups. In the case of control vs CNV 208 analytes show a significant difference between the groups. 79 analytes in case of control vs PCV. 49 of these analytes are found to be differentially regulated in both groups. In the comparison of CNV and PCV, no significant differentially regulated analytes could be detected. The compound classes of the differentially regulated analytes were retrieved using MetaboAnalyst 23 (Version 5.0)over representation analysis -based on the first compound name annotated by accurate mass search. Here, the super-, main-and sub-classes were identified for metabolites and lipids. CNV and PCV in comparison to the control group based on the differential expression analysis via limma, consist of metabolites from compound classes such as glycerophospholipids, organic heterocyclic compounds, sterol lipids, fatty acids, amino acids and dipeptides.

Quantification and evaluation of AMD biomarkers and additional candidates
Carnitines and their metabolites are mainly involved in fatty acid metabolism. Oleoylcarnitine (C25H47NO4; PCNV=0.002; PPCV=0.01), as well as L-Palmitoylcarnitine (C23H45NO4; PPCV=0.02), are upregulated by around 1.5 times in contrast to the control in both or PCV, respectively ( Supplementary Fig. 16a,b). These findings are following the research of an altered carnitine shuttle pathway in macular degeneration 24 . Our findings suggest that Linoelaidylcarnitine (C25H45NO4; PCNV=0.04; PPCV=0.03) which showed a similar increase in addition to the others might be a potential biomarker for AMD ( Supplementary Fig. 16c). Further possible biomarkers associated with AMD were determined from serum in previous studies, such as phenylalanine, hypoxanthine, tyrosine 25 . We found hypoxanthine levels were significantly increased in CNV (C5H4N4O; PCNV=0.006) by 3.9 times in contrast to the control, which affects the purine nucleotide cycle and can lead to apoptosis of photoreceptors 26,27 (Supplementary Fig. 16d). In addition, gamma-Glutamylphenylalanine (C14H18N2O5; PCNV=0.002; PPCV=0.0006) and gamma-Glutamylisoleucine (C11H20N2O5; PCNV=0.01; PPCV=0.04) were deregulated in both patient groups ( Supplementary Fig. 16e,f). Increased serum gamma-glutamyl transferase (GGT) levels were reported as risk factors for AMD 28 . This suggests that gamma-Glutamylphenylalanine with higher intensity by around 1.6 times in contrast to the control and gamma-Glutamylisoleucine (1.7 times increase) could be used as metabolic markers in this case. As a validation, dityrosine, which plays a role in oxidative stress and is associated with macular degeneration 29 , was increased by around 1.7 to 2.4 times in contrast to the control (C18H20N2O6; PCNV=0.002; PPCV=0.03) (Supplementary Fig. 17a). We found additional candidates without identification at m/z 601.271468 and a retention time 579 s (Unknown A; PCNV=0.0002; PPCV=0.000008) with an intensity increase of 5.2 and 5.7 in comparison to the control and at m/z 944.360964 and 311 s (Unknown B; PCNV=0.005; PPCV=0.03) with an intensity increase of 1.7 and 1.9 respectively (Supplementary Fig. 17b,c). An additional interesting aspect is the significantly deregulated compounds 5,8,11,14-Eicosatetraenoic acid (EPA) (C20H32O2; PPCV=0.01; PCNV=0.04), after removing one outlier in the CNV patient group, and 4,7,10,13,16,19-Docosahexaenoic acid (DHA) (C22H32O2; PPCV=0.006; PCNV=0.008) with an increase of 1.4 to 2.0 times in contrast to the control, have previously been associated with a reduced risk for neovascular AMD 30 (Supplementary Fig. 17d,e). An explanation for this finding in the patient groups could be an Omega-3 fatty acids rich diet, which is often advised to AMD patients due to their anti-inflammatory properties 31,32 . The identification results of the differential expression analysis are based on putative identifications via MS1 accurate mass search and MS2 fragment annotation, corresponding to a level 3 identification 33 . Here, to reach a level 1 identification, additional experiments in follow up studies are necessary to validate the potential biomarkers.

Further decoy generation methods and evaluation
Various potential decoy methods were evaluated. Fragdb: This method performs a random sampling of n fragments with a lower mass than the current precursor. Linmzperm: Instead of using masses directly this method uses the mass difference between the precursor and fragments. Here, the mass differences were calculated between the precursor and the first fragment, then the first and the second fragment and so on. Afterwards, the mass differences were shuffled randomly and subtracted from the precursor or previous fragment, generating new fragment masses with an in total similar mass difference. Since the last fragment always had the same total mass difference the mass of a -CH2 was added to the new fragment mass. If the same mass difference occurred twice, a -CH2 mass was added to the first fragment mass as well. Rtperm: This method performs a retention time permutation for the precursor and its fragments, within a certain minimal retention time difference. This difference should be higher than the retention time parameter set in OpenSWATH and has to be in the retention time range used in the experiment. The decoy retention time is generated such as decoy_rt = decoy_rt + ([0.0 -1.0] -0.5) x 400, where decoy_rt is initialized with the same value as the target retention time. Re-rooting: Here, the fragmentation tree-based method from Passatutto was used. The fragmentation trees were acquired using the fragment annotation via SIRIUS4. The SIRIUS4 format had to be parsed into a Passatutto compatible format. Then the method was called, and decoy spectra were used to extract transitions. In the case of overlapping transition and decoytransition masses after extraction, a -CH2 mass was added to the overlapping decoy transition. To ensure the same number of targets and decoys, if the re-rooting of the tree failed or the fragments were similar to the target ones, -CH2 was added to the original fragment masses as a fallback mechanism to ensure the generation of a decoy. The fallbacks were used in around 13% and 5% of the cases, respectively. Afterwards, the n highest intensity peaks were extracted to use in the target-decoy assay library. The following analysis was performed using the 20-50 eV collision energy data. With the introduced metabolomics score filter in PyProphet all decoy methods perform rather well ( Supplementary Fig. 18). The re-rooting method has a conservative tendency (overestimating the FDR slightly). Our aim was to establish a method, which can be used on a multitude of different datasets, in our humble opinion it is better to stay on the conservative side for FDR filtering, especially if the area-under-the-curve for most decoy methods are reported to be in the same range. In addition, we looked at the impact of adding a -CH2 mass at different steps in the decoy generation method. Adding a -CH2 mass is plausible for organic compounds since it can be added to the carbon backbone. There are two situations, where a -CH2 mass can be added to one or more decoy transitions. First, in the case of overlapping transition and decoy-transition masses after extraction, a -CH2 mass was added to the overlapping decoy transition (insertion -in). Second, if re-rooting (passatutto) of the fragmentation tree was not possible, -CH2 mass was added to all the target transitions, which were then used as decoy transitions (shift). These fallbacks were used in 13% and 5% of the cases respectively. Adding a -CH2 mass in these cases would allow it to have a similar number of targets and decoys, which is not ultimately necessary for the LDA approach used by PyProphet. Here, we tested all combinations of this method. Original: Re-rooting with insertion and the shift. WInwoShift: Re-rooting with insertion without shift. WoInwoShift: Re-rooting without insertion and without shift. WoInwShift: Re-rooting without insertion with shift. In terms of performance, there are no important differences between the described methods, in terms of FDR estimation and precision-recall ( Supplementary Fig. 19). A major difference in the disparity between the number of decoys for the specific sets (Number of decoys: original = 165, wInwoShift = 155, woInwoShift = 132, woInwShift = 142). As stated previously a slight discrepancy between the targets and decoys is not problematic for the LDA approach. We decided to further use the rerooting method with insertion and shift, to allow for a robust method applicable on a lot of different datasets. In our opinion, if the number of decoys is not far below the number of targets the other methods can be used in a similar fashion.

Comparison of DDA and DIA data
We visually compared the DDA and DIA XICs for the highest concentration and chose two representatives ( Supplementary Fig. 20). Both show a similar chromatogram shape on the MS1 level. At the MS2 level, the peak depends on the trigger time of the instrument (the chromatogram shape is based on the interpolation used in the Skyline visualization). For DIA MS2 the chromatogram is clearly visible and co-elutes exactly with the MS1 chromatogram.

Example workflow
Example workflow for the usage of the DIAMetAlyzer Pipeline in KNIME 34,35 ( Supplementary  Fig. 21). Inputs are the SWATH-MS data in profile mode (.mzML), a path for saving the new target-decoy assay library, the SIRIUS 36 (Version 4.0.1) executable, the DDA data (.mzML), custom libraries and adducts for AccurateMassSearch, the min/max fragment mass-to-charge to be able to restrict the mass of the transitions and the path to the PyProphet 6,7 executable. The DDA data is used for feature detection, adduct grouping, accurate mass search and forwarded to the AssayGeneratorMetabo. The constructed target library is used by the decoy generation node, which will call various python scripts to parse and reformat the input. In addition, Passatutto 3 is called and the target-decoy assay library exported. The assay library is used in combination with the SWATH-MS for OpenSWATH 37 and will be automatically analysed. PyProphet uses the results for scoring and exports a list of peak groups with associated FDR and quantitative values. For detailed parameters please check the workflow (.knwf).  The constructed target library is used by the decoy generation node, which calls various python scripts to parse, reformat the input, call Passatutto and export the target-decoy assay library. The target-decoy assay library is processed with the SWATH-MS data in OpenSWATH. The results are used by PyProphet for scoring and output a list of metabolites with their respective q-value and quantitative information.

Evaluation of the identification performance
The top 20 features with the highest significant differences between groups based on DIAMetAlyzer + Unknown quantification were used to assess the identification performance (Supplementary Table 4). Within DIAMetAlyzer the molecular formula for features without prior MS1 identification was determined via SIRIUS. In most of these cases, no MS2 spectral library match was detected in the GNPS database using the MASST Search (workflow release version 27), but in almost all cases a mirror match was detected stemming from another homo sapiens dataset, which suggests a valid feature without spectral library entries. Following such cases could potentially lead to the identification of new compounds which are related to the biological question. Further three MS1 identifications were directly validated by GNPS spectral library search. Other MS1 identifications could not be validated by GNPS, either since no matching MS2 spectrum could be found or MS2 spectral library search found other possible identification based on the spectrum with a cosine similarity between 0.72 -0.92. As in most metabolomics studies, the identification of compounds and their biological influence must be re-evaluated in follow up studies.