Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics

Large-scale metabolite annotation is a challenge in liquid chromatogram-mass spectrometry (LC-MS)-based untargeted metabolomics. Here, we develop a metabolic reaction network (MRN)-based recursive algorithm (MetDNA) that expands metabolite annotations without the need for a comprehensive standard spectral library. MetDNA is based on the rationale that seed metabolites and their reaction-paired neighbors tend to share structural similarities resulting in similar MS2 spectra. MetDNA characterizes initial seed metabolites using a small library of MS2 spectra, and utilizes their experimental MS2 spectra as surrogate spectra to annotate their reaction-paired neighbor metabolites, which subsequently serve as the basis for recursive analysis. Using different LC-MS platforms, data acquisition methods, and biological samples, we showcase the utility and versatility of MetDNA and demonstrate that about 2000 metabolites can cumulatively be annotated from one experiment. Our results demonstrate that MetDNA substantially expands metabolite annotation, enabling quantitative assessment of metabolic pathways and facilitating integrative multi-omics analysis.

The coverage and correct annotation rates using different cutoffs for the annotation score.
The x-axis represents the cutoff of the annotation score. The y-axis represents the coverage rate (red) and correct rate (green). The upper, middle and lower lines correspond to the first, second and third quartiles (the 25 th , 50 th and 75 th percentiles).  Blue, yellow, green and red bars represent the confidence levels of metabolite annotations from grades 1, 2, 3 to 4, respectively. S13 S14 Supplementary Figure 11 The confirmation of the seed L-Arginine and its neighbor metabolites using chemical standards and online spectral libraries.  (top left). Then, ADP was used as a seed metabolite and its MS2 spectrum was assigned to one of its neighbor metabolites (AMP, C00020) as surrogate MS2 spectrum. Peak M346T819 (m/z 346.0558, retention time 819.3 s) was annotated as AMP with a DP score of 0.62, m/z error of 1.4 ppm, RT error of 0.2%. Then, AMP was S16 selected as a new seed metabolite and its MS2 spectrum was assigned to two of its neighbor metabolites (IMP, C00130; N6-(1, 2-Dicarboxyethyl)-AMP, C03794) as the surrogate MS2 spectrum. Peak M347T846 (m/z 347.0395, retention time 845.8 s) was annotated as IMP with a DP score of 0.96, m/z error of 0.58 ppm, RT error of 3.9%. Peak M462T927 (m/z 462.0657, retention time 927.1 s) was annotated as N6-(1, 2-Dicarboxyethyl)-AMP with a DP score of 0.62, m/z error of 1.1 ppm, RT error of 2.4%. Then IMP was selected as a new seed metabolite and its MS2 spectrum was assigned to one of its neighbor metabolites (a) Statistics of the annotation coverage, and correct, isomer and error rates among the top 3 candidates; (b) S18 statistics of correct, isomer and error rates among top n (n=1 to 10) annotations. The x-axis represents the top n (n=1-10) metabolite candidates for each peak. The y-axis represents the percentages for correct (purple), isomeric (blue) and erroneous (black) annotations.
(c)The annotation rates for in silico MS2 spectral match with different DP cutoffs. (d-e) Comparison of annotation isomeric rate (d) and erroneous rate (e) between CFM-ID and MetDNA in the positive mode dataset (validation experiment 1).
(f-h) Comparison of annotation correct (f), isomeric (g) and erroneous rate (h) between CFM-ID and MetDNA in the negative mode dataset (validation experiment 1).
(i-j) The distributions of DP scores for the spectral similarities between the standard MS2 spectra and in silico MS2 spectra in positive (i) and negative modes (j). The CE is 30 eV.
(k) The distributions of DP scores for the spectral similarities between the standard MS2 spectra and in silico MS2 spectra with positive and negative mode. The CE are 20 eV and 50 eV. annotations. The x-axis represents the top n (n=1-10) metabolite candidates for each peak. The y-axis represents the percentages for correct (purple), isomeric (blue) and erroneous (black) annotations.

Supplementary Figure 15
Validation of high annotation accuracy from MetDNA using negative mode of Drosophila aging dataset in validation experiment 3.
(a) Statistics of the annotation coverage, and correct, isomer and error rates among the top 3 candidates; (b) statistics of correct, isomer and error rates among top n (n=1 to 10) annotations. The x-axis represents the top n (n=1-10) metabolite candidates for each peak. The y-axis represents the percentages for correct (purple), isomeric (blue) and erroneous (black) annotations.

Supplementary Figure 16
The overlap of enriched metabolic pathways using different numbers of seed metabolites.

Supplementary Figure 17
The influence of the initial seed number to the final annotation result (dataset #1-Drosophila aging dataset).
(a) The distributions of the confidence levels for metabolite annotations using different numbers of initial seed metabolites (132, 124, 117, 105, 100, 78, 69, 56, 37, and 21). (b) The statistics for peak redundancy and metabolite redundancy using different numbers of initial seed metabolites. All top 5 ranked metabolite annotations with score larger than 0.4 were reserved for the statistics. The influence of the cutoff of annotation score on the numbers of annotated peaks using the correct and four types of misannotated seed metabolites.

Supplementary Figure 20
The dysregulated glycolysis pathway in Drosophila aging datasets.
(a) Diagram illustrates that the transcriptional and metabolic changes in glycolysis in 3d-and 30d-old WT and PRC2 long-lived mutants fruit flies. (b) Barplots show thirteen detected metabolites in 3d-and 30d-old WT and long-lived PRC2 mutant fruit flies. The y-axis represents the z-score of metabolites (mean ± SD), n = 10 biologically independent samples for each group.

Supplementary Figure 21
The dysregulated arginine biosynthesis pathway in Drosophila aging datasets.
(a) Diagram illustrates that the transcriptional and metabolic changes in arginine biosynthesis pathway in 3dand 30d-old WT and long-lived PRC2 mutant fruit flies. (b) The correlation network of genes and metabolites in the arginine biosynthesis pathway. One node represents one gene or one metabolite. The connections were established by Pearson correlation (Student's t-test, P-values < 0.05 and absolute Pearson correlation values > 0.7). (c) Barplots show five detected metabolites, Aspartate, Argininosuccinic acid, Fumarate, arginine and Citrulline in 3d-and 30d-old WT and long-lived PRC2 mutant fruit flies. The y-axis represents the z-score of metabolites (mean ± SD), n = 10 biologically independent samples for each group.

S27
Supplementary Table 1 The detailed information of 28 metabolites confirmed using chemical standards, METLIN, NIST, HMDB library and CFM-ID.  Table 4 Correct and four types of misannotated seed metabolites in positive mode of Drosophila aging dataset (dataset #1).
Correct and four types of misannotated seed metabolites were constructed based on m/z error, RT error and MS2 spectral similarity. And then both correct and misannotated seed metabolites gone through the MRN based recursive annotation. in-house spectral library, the NIST17 library and METLIN library, respectively. Meanwhile, the same numbers of non-reaction pairs were also randomly generated for the comparison. Specifically, for each metabolite, its neighbor metabolites in library (in-house, NIST17 or METLIN) were retrieved and utilized to construct the reaction pairs (RPs). Meanwhile, the metabolites in the library and have the smallest m/z error with the true neighbor metabolites were also retrieved and utilized to construct non-RPs. For example, if 'metabolite a' has a neighbor metabolite 'b' in the library, so 'a-b' is constructed as a RP. And then the 'metabolite c' with the smallest m/z error with 'metabolite b' in the library is used to construct the non-RP. The spectral similarity was scored using the dot-product function (DP score, see Methods). In our in-house library, more than 55.3% of neighbor metabolites have a DP score larger than 0.5. In contrast, only 5.2% of non-neighbor metabolites have a DP score larger than 0.5. In NIST library, more than 53.6% of neighbor metabolites have a DP score larger than 0.5. In contrast, only 3.6% of non-neighbor metabolites have a DP score larger than 0.5. In METLIN library, more than 53.4% of neighbor metabolites have a DP score larger than 0.5. In contrast, only 3.2% of non-neighbor metabolites have a DP score larger than 0.5. We further explored the appropriate cutoff of DPs. When the cutoff is set as 0.8, the percentage of non-reaction pairs with DPs larger than 0.8 decreases to 2.0%. However, the percentage of reaction pairs with DPs larger than 0.8 significantly decreases to 37.3%.

Correct
So we think the cut-off score of 0.5 is appropriate.
To demonstrate how the cosine similarities decay as a function of the reaction 'distance' between two compounds, we constructed RPs and non-RPs with 2, 3, 4 and 5 steps, respectively. As shown in Supplementary Fig. 1b, the percentages of RPs with DP > 0.5 significantly decreased to 26.5%, 15.5%, 10.3% and 8.0% from 55.3% (with 1 step) for 2, 3, 4 and 5 reaction steps, respectively.
We designed an experiment described as follows to compare the bonanza score 1 , Hybrid Similarity Search (HSS) score 2 , GNPS 3 and dot product (DP) score. In brief, RPs retrieved from our in-house library were utilized to calculated DP, bonanza, HSS and GNPS scores, respectively. As shown in Supplementary Fig.   1c, most of bonanza scores were smaller than dot products, especially for the RPs with high DP scores.
However, the DP and bonanza scores from ~74.0% of RPs had no significant differences (absolute difference < 0.2). In addition, the percentage of RPs with bonanza scores larger than 0.5 decreased to 30.9% compared to S35 55.3% using DP. Meanwhile, the percentage of non-RPs with bonanza scores larger than 0.5 decreased to 2.0% compared with 5.2% using DP score ( Supplementary Fig. 1f). On the contrary, most of HSS scores were larger than DP scores, especially for the RPs with low DP scores ( Supplementary Fig. 1d). However, the DP and HSS scores from ~80.0% of RPs had no significant differences (absolute difference < 0.2). In addition, the percentage of RPs with HSS scores larger than 0.5 increased to 63.9% compared with 55.3% using DP score.
Meanwhile, the percentage of Non-RPs with HSS scores larger than 0.5 also increased to 10.2% compared to 5.2% using DP score ( Supplementary Fig. 1f). As noted, all GNPS scores were equal to or greater than DP scores ( Supplementary Fig. 1e). However, the DP and GNPS scores from ~87.0% of RPs had no significant differences (absolute difference < 0.2). In addition, the percentage of RPs with GNPS scores larger than 0.5 increased to 67.3% compared with 55.3% using DP score. Meanwhile, the percentage of Non-RPs with GNPS scores larger than 0.5 also increased to 10.4% compared to 5.2% using DP score ( Supplementary Fig. 1f).
Since bonanza score is a more strict scoring system than DP score. Presumably, the use of bonanza score may decrease the false positives in MetDNA, but it may also increase the false negatives and decrease the number of annotated metabolites in the same time. On the contrary, the use of HSS score and GNPS may have an inverse effect (i.e., lower false negatives but higher false positive rate). If one would incorporate the bonanza, HSS or GNPS score into MetDNA, the systematic optimization of score cut-off and evaluation of the validation results are required. Therefore, we think dot product score is appropriate enough in the MetDNA. (

1) Optimization of RT match threshold
To systemically optimize the retention time match tolerance, two experiments were designed.

Experiment 1
For positive mode of Drosophila aging dataset, the metabolite annotation was first performed using our in-house standard tandem spectral library through m/z (m/z match error < 25 ppm) and MS spectral match (Dot product > 0.8). Then the peaks with annotations were confirmed using measured standard RTs (RT match error < 60 s), and labeled as Annotation 1. Meanwhile, these peaks with annotations were also matched with the theoretical RTs using different match tolerances, and labeled as Annotation 2. The RT match threshold was set from 10% to 100% with a step 10%, and infinity (i.e., removing the RT match). Then the Annotation 2 was compared to Annotation 1 to calculate the true positive rate and true negative rate according to the rule in Supplementary Table 7. According to the distributions of true positive and true negative rates, a RT match tolerance of 30% is determined ( Supplementary Fig. 7). Table 7 Rules for calculating true positive and true negative rates in validation experiment 1.

Experiment 2
For positive mode of Drosophila aging dataset, the initial seed metabolites were selected according to the workflow of MetDNA. Then 30% of initial seed metabolites were randomly selected as initial seed metabolites for MRN-based recursive metabolite annotation, and the remained 70% were as the validation dataset (similar to Validation experiment #2). This process was repeated 10 times. In order to evaluate the influence of RT match threshold on the annotation, the RT match threshold was set from 10% to 100% with a step 10%, and infinity (i.e., removing the RT match). For each RT match threshold, we ran the process described above. Finally, the peak redundancy and annotation correct rate using top 3 candidates were S37 calculated using the comparison between the validation dataset and annotations from MetDNA. The influences of RT match threshold on annotation correction rate and the peak redundancy were summarized in Supplementary Fig. 7b and 7c.
(2) Optimization of weights of annotation score Five weights, Wm/z1 and WRT1 for Scoreadduct and Wm/z2, WRT2 and Wspec for Scoreiden were systemically optimized in MetDNA. We first defined the combinations of these weights as follows.
Wm/z1 = 0.1 + 0.05 * n Finally, a total of 2,040 weight combinations were created, and used for MetDNA analysis one by one. A strategy similar in experiment 2 for optimizing RT match threshold was used. For each experiment, 30% of initial seed metabolites were randomly selected as initial seed metabolites for MRN-based recursive metabolite annotation, and the remained 70% were as the validation dataset. This process was repeated 10 times. In each experiment, a set of weight combination was used. Then, the annotation correction rate using top 3 candidates (no score cutoff) was calculated using the comparison between the validation dataset and annotations from MetDNA. Thus, a total of 2,040 annotation correction rates were calculated and plotted in Supplementary Fig. 8a. The correction rates ranged from 79.0% to 83.5%.
The weight combinations with correct rate > 82.0% (426 out of 2,040) were retrieved to investigate the optimal weight combinations. The value distributions of Wm/z1 and WRT1 in adduct annotation retrieved from the 426 combinations were plotted in Supplementary Fig. 8b. Clearly, the m/z match weight is generally higher than RT match weight. To simply the optimization process, four weigh combinations for adduction annotation were selected: 1) Wm/z1 is 0.9 and WRT1 is 0.1; 2) Wm/z1 is 0.85 and WRT1 is 0.15; 3) Wm/z1 is 0.8 and WRT1 is 0.2; and 4) Wm/z1 is 0.75 and WRT1 is 0.25. We further calculated the annotation coverages and correction rates using 12 weight combinations and different cutoffs of annotation scores ( Supplementary Fig.   S38   9). There were no significant differences in annotation coverage rates and correct rates when score cutoff was set from 0 to 0.4. When the score tolerance was set larger than 0.4, both annotation coverage and correction rate decreased (Supplementary Fig. 9). Therefore, the final score cutoff was determined as 0. Orbitrap Q-Extractive HF-X) and three different acquisition methods (data dependent acquisition (DDA), data independent acquisition (SWATH), and targeted MS2 acquisition) are also included (Supplementary Tables 2   and 3).
The information for dataset #1 (Drosophila aging dataset) was provided in Methods. Here, we provide more information for dataset #2-11.

Study 1 (Datasets #2-4). Metabolism regulation of aging in mouse liver tissue
Two groups of aging mouse liver tissues (c57BL/6J; 24-week vs. 78-week; n = 10 for each group) were collected to study the dysregulated metabolic pathways in aging mouse.

Sample preparation
Mouse liver tissue samples (~10 mg) were first homogenized with ceramic beads and 200 μL of H2O for three times using a Precellys homogenizers. The homogenization took 20 s with 5 s intervals for each time.

LC-MS analysis and data processing
Metabolomics data of aging mouse liver samples were acquired using different MS platforms and data acquisition methods (datasets #2-4 in Supplementary Table 2).
For dataset #2, metabolomics data acquisition was performed using  MS-DIAL were also processed using MetDNA.

MetDNA analysis results
In the dataset #2, a total of 21,607 and 18,091 peaks were detected in positive and negative modes, respectively. After MetDNA annotation, a total of 1,901 metabolites were annotated (1,301, 1,373 and 773 for positive, negative and both modes, respectively). A total of 16 dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < 0.05).
In the dataset #3, a total of 14,532 and 10,904 peaks were detected in positive and negative modes, respectively. After MetDNA annotation, a total of 2,296 metabolites were annotated (1,699, 1,550 and 953 for S42 positive, negative and both modes, respectively). A total of 21 dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < 0.05).
In the dataset #4, a total of 12,507 and 17,025 peaks were detected in positive and negative modes, respectively. After MetDNA annotation, a total of 953 metabolites were annotated (516, 658 and 221 for positive, negative and both modes, respectively). A total of 27 dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < 0.05).

Study 2 (Datasets #5 and #6). The metabolic regulation function of RIP1 in MEF cell
Two groups of MEF cells (wild-type and RIP1 -/-; n = 6 for each group) were used to study the dysregulated metabolic pathways in RIP1 -/cells.

Sample preparation
The MEF cell pellets (about 10 6 cells) were extracted using a MeOH:ACN:H2O (2:2:1, v/v) solvent mixture. A volume of 1 mL of cold solvent was added to each cell pellet, vortexed for 30 s and incubated in liquid nitrogen for 1 min. The samples were then allowed to thaw at room temperature and sonicated for 10 min. This freeze-thaw cycle was repeated three times. Then the samples were incubated for 1 h at -20 C,

LC-MS analysis and data processing
For dataset #5, metabolomics data acquisition was performed similar to dataset #2, expect a 12-min gradient was used. The flow rate was 0.5 mL/min and the linear gradient was set as follows: 0-0. Datasets #5 and #6 were processed using R package xcms 4 and MetDNA.

MetDNA analysis results
In the dataset #5, a total of 27,270 and 20,615 peaks were detected in positive and negative modes, respectively. A total of 1,976 metabolites were annotated (1,390, 1,288 and 702 for positive, negative and both modes, respectively). A total of 22 dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < 0.05).
In the dataset #6, a total of 12,783 and 11,781 peaks were detected in positive and negative modes, respectively. A total of 519 metabolites were annotated (336, 263 and 80 for positive, negative and both modes, respectively). A total of 16 dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < S44 0.05).

Study 3 (Datasets #7 and #8). Metabolic changes between wide-type and daf-2 mutant C. elegans
Two groups of C. elegans (wild-type vs. daf-2 mutant, n = 6 for each group) were used to study the dysregulated metabolic pathways in daf-2 mutated C. elegans.

Sample preparation
C. elegans samples were collected into microcentrifuge tubes and then homogenized with ceramic beads and 200 μL of H2O for three times using Precellys homogenizers. The rest metabolite extraction was the same as the extraction of mouse liver tissues.

LC-MS analysis and data processing
For dataset #7, the data acquisition was performed the same as dataset #2.
For dataset #8, metabolomics data acquisition was performed using Dataset #7 and #8 were processed using R package xcms 4 and MetDNA.

MetDNA analysis results
In the dataset #7, a total of 45,370 and 45,310 peaks were detected in positive and negative modes, respectively. A total of 2,747 metabolites were annotated (2,013, 2,069 and 1,335 for positive, negative and both modes, respectively). A total of 28 dysregulated metabolic pathways were enriched (Hypergeometric test, S45 P-values < 0.05).
In the dataset #8, a total of 35,992 and 35,178 peaks were detected in positive and negative modes, respectively. A total of 837 metabolites were annotated (407, 632 and 202 for positive, negative and both modes, respectively). A total of 30 dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < 0.05).

Study 4 (Dataset #9). Metabolomic profiling of E. coli with protein (α-syn) expression
Two groups of E. coli samples (wild-type vs. E. coli with α-syn expression, n = 10 for each group) were used to study the dysregulated metabolic pathways in E. coli with protein (α-syn) expression.

Sample preparation
The E. coli cell samples (OD600 nm = 1.0, 10 mL) were extracted using a MeOH:ACN:H2O (2:2:1, v/v) solvent mixture. The rest metabolite extraction was the same as the extraction of MEF cell sample.

LC-MS analysis and data processing
The LC-MS analysis and data processing of dataset #9 are the same as dataset #2.

MetDNA analysis results
In dataset #9, a total of 35,498 and 22,534 peaks were detected in positive and negative modes, respectively. A total of 2,340 metabolites were annotated (1,713, 1,590 and 963 for positive, negative and both modes, respectively). A total of 36 pathways dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < 0.05).

Study 5 (Dataset #10). Metabolomic profiling of human colorectal cancer tissues
Two groups of colorectal tissues (cancer vs. adjacent healthy tissues, n = 10 for each group) were used to study the dysregulated metabolic pathways in colorectal cancer tissues.

Sample collection
The  Supplementary Table 8. The tumor tissues were excised from CRC patients during surgery. Meanwhile, the adjacent healthy tissues were surgically excised at 5 to 10 cm away from the tumor.
The tissue samples were immediately frozen using liquid nitrogen and stored at -80 °C until metabolite extraction.
Supplementary Table 8 Baseline and histopathologic characteristics of participant subjects in Study 5.

Sample preparation
The sample preparation of colorectal tissues was the same as the preparation of mouse liver tissues.

LC-MS analysis and data processing
The LC-MS analysis and data processing of colorectal sample are the same as dataset #3.

MetDNA analysis results
In dataset #10, a total of 8,797 and 4,467 peaks were detected in positive and negative modes, respectively. A total of 1,893 metabolites were annotated (1,334, 1,087 and 528 for positive, negative and both modes, respectively). A total of 23 dysregulated metabolic pathways were enriched (Hypergeometric test, P-values < 0.05).

Study 6 (Dataset #11). Metabolomic profiling of urine samples from esophagus cancer patients
Two groups of urine samples from patients with esophagus cancer (EC) and healthy controls (n = 20 for each group) were collected to study the dysregulated metabolic pathways in patients with esophagus cancer.

S47
Urine samples were collected at the Esophageal Cancer Screening Base of Shandong Province (Feicheng, Shandong Province, China). The study was approved by the Ethics Committee of the Shandong Tumor Hospital and written informed consent was obtained from all participants involved in this study. All the participants with ages 40-69 years enrolled in this base were screened for esophageal cancer using endoscopy with mucosal iodine staining. In this study, participants with normal esophageal mucosa (iodine-positive) was regarded as the health controls. Biopsies of the participants with iodine-negative pathology were taken from the non-staining area of the mucosa, which were then underwent pathological evaluation for confirming and staging by two independent pathologists. Each EC patient was diagnosed and staged according to the American Joint Committee on Cancer (AJCC) TNM Classification of Carcinoma of the Esophagus and Esophagogastric Junction (7th edition, 2010). Detailed baseline and histopathologic characteristics for these patients are listed in Supplementary Table 9. Participants involved in this study were not taking any medications, surgery, radiotherapy or chemotherapy, and those suffering from metabolic diseases, liver diseases, kidney diseases or any other cancers were excluded. All of the participants were in an overnight fasting state and 5 mL of urine was taken in the morning. The urine samples were stored at −80 °C until further analysis. Table 9 Baseline and histopathologic characteristics of participant subjects in Study 6.

Sample preparation
Urine samples were thawed at 4 °C on ice. Then, 50 μL of urine sample was taken and transferred into a microcentrifuge tube, and 150 μL of water was added to dilute urine. The diluted urine samples were centrifuged at 21,600 xg and 4 °C for 15 min. Finally, 100 μL of supernatant was then transferred to HPLC S48 glass vials and stored at -80 °C prior to LC-MS analysis.

LC-MS analysis and data processing
For dataset #11, metabolomics data acquisition was performed similar to dataset #2, expect for the mobile phase, column and linear gradient. A Merck SeQuant ZIC-HILIC column (particle size, 3.5 μm; 100 mm (length) × 2.1 mm (i.d.)) was used for the LC separation and the column temperature was kept at 25 °C. The mobile phase A was 10 mM ammonium acetate (NH4OAc) in 95% water and 5% acetonitrile, and B was 10 mM ammonium acetate (NH4OAc) in 5% water and 95% acetonitrile both in positive mode (ESI+) and The data processing methods are the same with dataset #2.

Metabolite annotation and pathway enrichment analysis results
For dataset #11, a total of 15,721 and 16,891 peaks were detected in positive and negative modes, respectively. A total of 2,355 metabolites were annotated (1,451, 1,854 and 950 for positive, negative and both modes, respectively). A total of 22 pathways were enriched (Hypergeometric test, P-values < 0.05).

Validation experiment 1
A total of 200 metabolite standards (Supplementary Table 4

Validation experiment 3
MassBank library was download from the official website of MassBank

Comparison of MetDNA and in-silico MS2 spectral match from CFM-ID
To benchmark and compare the performance of MetDNA and in silico MS2 spectra match, an experiment was designed and described below. All the metabolites in MRN (6,439 metabolites) were utilized to predict their theoretical MS2 spectral using the very popular command line tool of CFM-ID 8 [https://sourceforge.net/projects/cfm-id/] (version 2.0). The pre-trained set models (metab_se_cfm) and recommend parameters (param_output0.log) were used in both positive and negative modes, respectively. All predicted MS2 spectra were further normalized to the base peak (the highest peak intensity). For each metabolite, the 20, 30 and 50 eV MS2 spectra for positive and negative modes, respectively were generated. It means that one metabolite has 6 theoretical MS2 spectra with three collision energy values and two polarity modes.
Then we identified 200 standard metabolites in mouse liver samples using in silico MS2 spectra match.
Manual comparison with chemical standards demonstrated that 113 and 137 metabolites were detected and To investigate the influence of tandem spectral library size on the results of MetDNA, a strategy was designed and described below. For Drosophila aging dataset, the initial seed metabolites were selected according to the workflow of MetDNA. Then 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80% and 90% of all initial seed metabolites were randomly selected from all initial seed metabolites. The different numbers of initial seed metabolites were used to mimic the different sizes of tandem spectral libraries. The result using all initial seed metabolites were considered as the control. The results obtained using different numbers of initial seed metabolites were summarized, ands compared with the control to calculated annotated metabolite number (Fig. 4a), confidence level ( Supplementary Fig. 17a), and annotation redundancy ( Supplementary Fig. 17b). In addition, the overlap percentages of MS1 peaks with annotations using different initial seed metabolites and the overlap percentages of MS1 peaks with the exact same annotations using different initial seed metabolites were also calculated (Fig. 4b). In both cases, the annotation result using all seed metabolites was used as a control. The similar experiment was also repeated using the aging mouse liver dataset ( Supplementary Fig.   18).
The pathway enrichment results using 10%, 30%, 50%, 70% and 100% (i.e., all seed metabolites) of all initial seed metabolites were also summarized, and a Venn diagram was drawn to assess the consistence of pathway enrichment results with different size of tandem spectral library in MetDNA ( Supplementary Fig.   16).
For positive mode of Drosophila aging dataset, metabolites were firstly annotated through m/z match (m/z match error < 25 ppm), measured standard RT match (RT match error < 60 s) and MS2 spectral match (dot product > 0.8). Finally, 167 peaks with 107 metabolites were annotated and recorded as the correct seed metabolites. Then the correct seed metabolites were used to construct the misannotated seed metabolites. Take type 1 misannotated seed metabolite for instance. First, the peaks with correct annotations were retrieved, and for each peak with correct annotations, its correct annotations were removed, and then the same number of metabolites from KEGG database with m/z match error > 25 ppm and RT match error > 60 s were randomly selected and assigned to this peak. Finally, there were 167 peaks with 169 misannotated annotations for type 1 misannotated seed metabolites. Similarly, type 2, 3 and 4 misannotated seed metabolites were also constructed using the different criteria provided in Supplementary Table 4. From type 1 to type 4 misannotated metabolites, the degree of similarity compared to the correct metabolite increases. Thus, they become more challenging to distinguish, and may have higher chance to propagate during the iterative process. Then MetDNA analyses were performed using the correct or misannotated seed metabolites, and we compared and validated the annotation results (Supplementary Table 4).
To investigate the reason why the propagation of misannotated metabolites is intrinsically low, for each set of corrected or misannotated seed metabolites, we first retrieved the neighbor metabolites of all the seed metabolites. Then, neighbor metabolites were match to all MS1 peaks through m/z and RT matches. Only the MS1 peaks with m/z match error < 25 ppm and RT match error < 30% were remained. MS2 spectra from these peaks were compared with surrogate MS2 spectra of neighbor metabolites (i.e., MS2 spectra from seed metabolites), and to calculate DP scores. For each neighbor metabolite matched with multiple MS1 peaks, only the maximum DP score was retained for statistics. So if one set of seed metabolites had total n neighbor metabolites, there will be n DP scores. The DP distribution were calculated for each set of correct and misannotated seed metabolites (Fig. 4f). were processed using R package xcms (version 1.46.0) 4 for peak detection and alignment. The mzXML files were placed in two folders named as: W03 and W30 according to their groups. Then the data was processed using the code shown below:  (1) Keep columns named name, mzmed and rtmed and abundance of all samples.
(2) Rename the first three columns as name, mz and rt.
(3) Name the peak table Peak_Table_POS or Peak_Table_NEG according to its polarity.

Prepare MS2 data files
The 20 raw data files (.wiff format) in each ionization polarity were converted to mgf format using ProteoWizard [http://proteowizard.sourceforge.net/] (version3.0.6150). The parameter settings can be found in Table 1 in Methods.

Prepare sample information file
A sample information file (.csv format) was prepared to describe the sample group information. The first column was named as sample.name, while the second one was named as group. Two group names (W03 and W30) were provided.

Process metabolomics data using MetDNA
The MS1 peak  It is worthy to note that different TOF MS instruments have different resolutions and mass accuracies.
Specifically for Q-TOF MS in our study (Sciex TripleTOF and Agilent Q-TOF), the resolution and mass accuracy is m/z dependent. The low mass range (e.g., 50-400 Da) has a significantly lower resolution and mass accuracy than high mass range (e.g., 800-1200 Da). In addition, low ion intensity may also have a detrimental effect on mass accuracy. Therefore, it is best to use a wider mass tolerance (i.e., 25 ppm in our work) than the theoretical tolerance of an instrument to cover all possible situations (i.e., relative low-resolution TOF instrument, low mass ion, low-intensity, etc.). However, this tolerance can be adjusted by users. The similar suggested tolerance was also given in our previously published protocol 9 .