Reactomics: Using mass spectrometry as a chemical reaction detector

Chemical reactions among small molecules enable untargeted metabolomics analysis, in which small molecules within tissue samples are identified through high-throughput assays. In standard mass spectrometry-based metabolomics, first significant small molecules are identified, then their biochemical relationships are probed to reveal biological fate (environmental studies) or biological impact (physiological response). However, we propose that biochemical relationships could be directly retrieved through untargeted high-resolution paired mass distance (PMD), which investigates chemical pairs in the samples without a priori knowledge of the identities of those participating compounds. We present the potential for this chemical reaction detector, or ‘reactomics’ approach, linking PMD from the mass spectrometer to biochemical reactions obtained via data mining of known small molecular metabolites/compounds and reaction databases. This approach encompasses both quantitative and qualitative analysis of reaction by mass spectrometry, and its potential applications include PMD network analysis, source appointment of unknown compounds, and biomarker reaction discovery instead of compound discovery. Such applications may promote novel biological discoveries that are not currently possible with classical chemical analysis.


Introduction
Metabolomics or non-targeted analysis using high-resolution mass spectrometry is one of the most popular analytic methods for unbiased measurement of organic compounds (1,2). A typical metabolomics sample analysis workflow follows a metabolite detection, statistical analysis, and annotation/identification of compounds using MS/MS and/or authentic standards. However, annotation or identification of unknown compounds is time consuming and sometimes impossible, which may limit biological interpretation (3). Through MS/MS, experimentally obtained fragment ions of the chemical of interest can be matched to a mass spectral database (4), but many compounds remain unreported and therefore unmatchable. Alternately, rules-or data mining-based prediction of in silico fragment ions is successful in many applications (2,5), yet these approaches are prone to overfitting the known compounds. Finally, the final validation step requires commercially available or synthetically generated analytical standards for unequivocal identification, but such standards may not be available for all compounds. In this case, the workflow of compounds identification is always biased towards known compounds, and biological information from unknown compounds is not fully used.
Biochemical knowledge, through the integration of known relationships between biochemical reactions (e.g., pathway analysis), could also help provide potential molecular structures for annotating unknown compounds (3). Such methods are readily used to annotate compounds by chemical class. For example, the referenced Kendrick mass defect (RKMD) can be used to predict lipid class by first identifying a lipid through a specific mass distance (14.01565 Da) then identifying specific mass distances of heteroatoms to further determine lipid class (6). Similarly, isotope patterns in combination with specific mass distances characteristic of halogenated compounds (e.g., +Cl/-H, +Br/-H) can be used to screen halogenated chemical compounds in environmental samples (7). For these examples, known mass relationships among compounds are used to annotate unknown classes of compounds, providing evidence that a general relationship based annotation has the potential to uncover unknown information from samples.
The most common relationships among compounds are chemical reactions. Substrate-product pairs in a reaction form by exchanging functional groups or atoms. In fact, almost all organic compounds originate from biochemical processes, such as carbon fixation (8,9). As in base pairing of DNA (10), organic compounds follow biochemical reaction rules that thereby result in characteristic mass differences between paired substrates and their products. Our concept, paired mass distance (PMD), reflects such rules by calculating the mass differences of two compounds or charged ions. Mass distances can also directly reveal isotopologue information (11), adducts from a single compound (12), or adducts formed via complex in-source reactions (13). Highresolution mass spectrometry (HRMS) can directly measure such paired mass distances with the mass accuracy needed to provide reaction-level specificity. Therefore, HRMS has the potential to be used as a 'reaction detector' to enable reaction-level annotations. Such reaction level information from the samples will provide an evidence-based link between protein/enzyme level changes in the samples with compounds/metabolite level changes, providing additional biological information.
Here, we use multiple databases and experimental data to provide a proof-of-concept for using mass spectrometry as a reaction detector. We discuss potential applications of this approach, such as PMD network analysis which can be used to search for biologically related metabolites to a targeted compound of interest, source appointment which can be used to characterize unknowns as endogenous or exogenous, and biomarker reaction discovery which can be used to calculate reaction level changes as a predictor of disease.

Definition
We first define a reaction PMD (PMD R ) using a theoretical framework. Then we demonstrate how a PMD R can be calculated using Kyoto Encyclopedia of Genes and Genomes (KEGG) reaction R00025 as an example (see equation 1). There are three KEGG reaction classes (RC00126, RC02541, and RC02759) associated with this reaction, which is catalyzed by enzyme 1.13.12.16.
Ethylnitronate + Oxygen + Reduced FMN <=> Acetaldehyde + Nitrite + FMN + Water [1] In general, we define a chemical reaction (PMD R ) as follows: S 1 + S 2 + … + S n <=> P 1 + P 2 + … + P m (n >= 1, m >= 1) [2], where S means substrates and P mean products, and m and n the number of substrates and products, respectively. A PMD matrix [M1] for this reaction is generated: For each substrate, S k , and each product, P i , we calculate a PMD.
Assuming that the minimum PMD would have a similar structure or molecular framework between substrate and products, we select the minimum numeric PMD as compound PMD (PMD Sk ), of that reaction (Eq. 3).
PMD Sk = min(|S k -P 1 |,|S k -P 2 |,...,|S k -P m |) (1<=k<=n) [3] Then, the PMD R is defined as the set of substrates' PMD(s) (Eq. 4): PMD R = {PMD S1 , PMD S2 ,...,PMD Sn } [4] For KEGG reaction R00025, S 1 is ethylnitronate, S 2 is oxygen, S 3 is reduced FMN, P 1 is acetylaldehyde, P 2 is nitrite, P 3 is FMN, P 4 is water, m =3, and n=4. In our example, PMD R is 27.023 Da that is equivalent to the mass difference between two carbon atoms and three hydrogen atoms; PMD R is 12.036 Da for the additions of two carbon atoms and four hydrogen atoms and loss of one oxygen atom; and PMD R is 2.016 Da for the addition of two hydrogen atoms. One reaction can have multiple PMD R , but no more than n PMD R has two notations: one is shown as an absolute difference of the substrate-product pairs' exact masses or monoisotopic masses with unit Da. Another notation is using elemental compositions. Here, we describe an elemental composition instead of chemical formula, because this notation also describes the gain and loss of elements, and therefore the neat mass change. In our example reaction, the PMD R can also be written as +2C3H, +2C4H/-O, and +2H. This elemental composition can be linked to known chemical processes retrieved from a reaction database, i.e., KEGG. For example, +2H represents the elemental composition change of a reaction involving a double bond breaking such as our KEGG example RC00126, and +2C3H indicates reaction with nitronate monooxygenase (EC:1.13.12.16) or reaction class RC02541. However, some elemental compositions, such as +2C4H/-H in our example, might not have a clear mechanism (e.g., no suggested KEGG reaction selection). By this definition, PMD R can be generated automatically in terms of elemental compositions or mass unit in Da.

PMD analysis of a reaction database
To demonstrate the feasibility of using mass spectrometry as a chemical reaction detector, here we show that common and biologically relevant reactions can be written as PMD R . KEGG, with 11262 reactions and 10213 unique formulas, was used as a reference reaction database (14

Qualitative PMD analysis
We propose that PMD analysis can be applied using mass spectrometry. Mathematically, a PMD of uncharged compounds is equivalent to the PMD of their charged species observed during mass spectrometry, as long as both compounds share the same adducts, neutral losses, and charges. A challenge with HRMS is that for each analyte there is usually not a single ion, but redundant peaks that include various adducts, in-source fragments, neutral losses, and isotopes that are generated from the same analyte. For PMD analysis we assumed that compounds involved in paired biological reactions will generate the same type of redundant peaks in the mass spectra.
To perform PMD analysis, we must first reduce the number of redundant peaks into independent peaks. Using the GlobalStd algorithm (12) or psuedu-spectra from annotation tools such as CAMERA (16) or RAMclust (17), a single peak representing the same type (adduct, neutral loss, or isotope) between paired analytes is selected for each cluster of redundant peaks. When the resulting filtered peaks are used for PMD analysis, they can then be linked to a specific biological reaction (PMD R ).
Once PMDs are calculated, linking the PMD R to specific elemental compositions will provide valuable biological context. However, annotation of the elemental compositions of certain PMD is dependent on high-resolution mass spectrometers. Low-resolution instruments that only measure nominal mass may not be specific enough to distinguish elemental compositions. For example, PMD 14 Da could be the addition or loss of a nitrogen atom or the addition of one oxygen atom and loss of two hydrogen atoms.
Here, we use HMDB (15) to compare low-resolution versus high-resolution measurements in determining elemental compositions. HMDB contains 114,100 compounds with 11,523 unique chemical formulas with known elemental compositions. PMD, as well as the elemental composition, was computed for the unique chemical formulas rounded to one, two, or three decimal places. Higher frequencies of a PMD are observed when rounding to fewer digits, suggesting the presence of false positives ( Table 2). As confirmation of the annotation accuracy, we determined how many of the PMDs in Table 2 resulted from a change in chemical formula linked with the appropriate PMD for the range of reported decimal places ( PMDs, accuracy > 94% was observed when the PMDs were rounded to three decimals, only ≥ 51% when rounded to two decimal places, and < 10% when only 1 or 0 decimals were used (see Table 3), confirming that high-resolution mass spectrometry is required for qualitative PMD analysis and elemental composition annotation.

Quantitative PMD analysis
In addition to qualitative analysis, peaks that share the same PMD can be summed and used as a quantitative group measure of that specific 'reaction' in the sample, thereby, providing a description of chemical reaction level changes in a sample without annotating individual compounds. There were two types of PMD across samples: static PMD in which intensity ratios between the pairs were stable across samples, and dynamic PMD in which the intensity ratios between pairs changed across samples. Only static PMDs, those with similar instrument response, can be used for quantitative analysis to avoid the complexity of changes from multiple peaks(see Table 4 for theoretical example). Similar to another non-targeted analysis (18), we suggest an RSD between quantitative pair ratios < 30% and a high correlation between the paired peaks' intensity ( > 0.6) to be considered a static PMD. We provide functions in the pmd package to determine static PMD. While in-source reactions, mass accuracy, and stable paired mass intensity ratio are three important considerations for reaction-level qualitative and quantitative analysis via PMDs, the described tools and methodologies, namely, removal of redundant peaks, use of HRMS, and static PMD selection, can overcome these challenges to enable use of HRMS as a reaction detector.

Reactomics
We suggest 'reactomics' as a new approach to investigate reaction-level changes in biological and environmental samples and to link untargeted metabolomics data with biological processes.

PMD network analysis
PMD network analysis enables identification of metabolites associated with a known biomarker of interest. For example, links between high-frequency PMDs from the KEGG reaction database and a target analyte can be determined. Selected metabolites caffeine, glucose, bromophenol, and 5-cholestene were paired with other metabolites in the KEGG reaction database using the top-20 high-frequency PMDs from Table 1. Different topological properties (e.g., number of nodes, communities, etc.) of compounds' PMD network were observed for each selected target metabolite ( Figure 1). Comparing these networks with known pathways may allow tentative annotation of unknown pathways. For example, an unknown compound with a similar topological structure as caffeine (see Figure 1), might have similar biological activity to caffeine.
In fact, PMD network analysis can also be used in combination with classic identification techniques, to enhance associated networks with targeted biomarkers. As proof-of-concept, we re-analyzed data from a published study to find the biological metabolites of exposure to tetrabromobisphenol A (TBBPA) in pumpkin (19,20) using a local, recursive search strategy (see Figure 2). Using TBBPA as a target of interest, we searched for PMDs linked with debromination process, glycosylation, malonylation, methylation and hydroxylation, which are phase II reactions (e.g., primary metabolites) found in the original paper (19). The identified peaks with these PMDs were added to the network as secondary metabolites, and the process repeated until all PMDs and extensions were exhausted. Using PMD network analysis, we identified 22 unique m/z ions of potential TBBPA metabolites; 15 of these were unique ions not described in the original study. Most of the potential metabolites of TBBPA were found as higher-generation TBBPA metabolites (Figure 2), which are too computationally intensive to be identified using in silico prediction.

Source appointment of unknown compounds
When an unknown compound is identified as a potential biomarker, determining whether it is associated with endogenous biochemical pathways or exogenous exposures can provide important information toward identification. We found that high-frequency PMDs from HMDB are dominated by reactions with carbon, hydrogen, and oxygen (

Biomarker reactions
Reactomics can also be used to discover biomarker 'reactions' instead of biomarker 'compounds'. Unlike typical biomarkers that are a specific chemical compound, biomarker reactions contain all peaks within a fixed PMD relationship and correlation coefficients cutoff.
Thus, quantitative PMD analysis can be used to determine if there are differences between groups (e.g., control or treatment, exposed or not-exposed, etc.) on a reaction level. Such differences would be described as a biomarker 'reaction'.
In the publicly available dataset MTBLS28 (21), four independent peaks from 1807 samples generated the quantitative responses of PMD 2.02 Da. This biomarker reaction (e.g., +2H from our annotated database) was significantly decreased in case samples compared with the control group (t-test, p < 0.05; see Figure 4). The original publication associated with this dataset did not report any molecular biomarker associated with this reaction (21). Thus, quantitative PMD analysis can offer additional information on biological differences between groups at the reaction level that may be lost when focused on analysis at the chemical level. Furthermore, these results suggest that follow-up analysis in this population should include targeted analysis of proteins or enzymes linked with +2H changes.
on Figure 4. Quantitative PMD analysis identifies PMD 2.02 Da as a potential biomarker reaction for lung cancer (MTBLS28 dataset).

Conclusion
We provide the theoretical basis and empirical evidence that high-resolution mass spectrometers can be used as reaction detectors through calculation of high-resolution paired mass distances and linkage to reaction databases such as KEGG. Reactomics, as a new concept in bioinformatics, can be used to find biomarker reactions or develop PMD networks. These techniques can provide new information on biological changes, to ultimately promote novel biological inferences that may not be observed through classic chemical biomarker discovery strategies.