Subpathway-CorSP: Identification of metabolic subpathways via integrating expression correlations and topological features between metabolites and genes of interest within pathways

Feng, Chenchen; Zhang, Jian; Li, Xuecang; Ai, Bo; Han, Junwei; Wang, Qiuyu; Wei, Taiming; Xu, Yong; Li, Meng; Li, Shang; Song, Chao; Li, Chunquan

doi:10.1038/srep33262

Download PDF

Article
Open access
Published: 14 September 2016

Subpathway-CorSP: Identification of metabolic subpathways via integrating expression correlations and topological features between metabolites and genes of interest within pathways

Chenchen Feng¹^na1,
Jian Zhang¹^na1,
Xuecang Li¹^na1,
Bo Ai¹^na1,
Junwei Han²^na1,
Qiuyu Wang³^na1,
Taiming Wei⁴^na1,
Yong Xu⁵^na1,
Meng Li¹^na1,
Shang Li²^na1,
Chao Song⁶^na1 &
…
Chunquan Li¹^na1

Scientific Reports volume 6, Article number: 33262 (2016) Cite this article

1674 Accesses
10 Citations
Metrics details

Subjects

Abstract

Metabolic pathway analysis is a popular strategy for comprehensively researching metabolites and genes of interest associated with specific diseases. However, the traditional pathway identification methods do not accurately consider the combined effect of these interesting molecules and neglects expression correlations or topological features embedded in the pathways. In this study, we propose a powerful method, Subpathway-CorSP, for identifying metabolic subpathway regions. This method improved on original pathway identification methods by using a subpathway identification strategy and emphasizing expression correlations between metabolites and genes of interest based on topological features within the metabolic pathways. We analyzed a prostate cancer data set and its metastatic sub-group data set with detailed comparison of Subpathway-CorSP with four traditional pathway identification methods. Subpathway-CorSP was able to identify multiple subpathway regions whose entire corresponding pathways were not detected by traditional pathway identification methods. Further evidences indicated that Subpathway-CorSP provided a robust and efficient way of reliably recalling cancer-related subpathways and locating novel subpathways by the combined effect of metabolites and genes. This was a novel subpathway strategy based on systematically considering expression correlations and topological features between metabolites and genes of interest within given pathways.

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

Wenpin Hou & Zhicheng Ji

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Qiuyue Yuan & Zhana Duren

Single-cell multi-ome regression models identify functional and disease-associated enhancers and enable chromatin potential analysis

Article Open access 21 March 2024

Sneha Mitra, Rohan Malik, … Christina S. Leslie

Introduction

Metabolic dysfunction is a primary cause of many complex diseases that have become prevalent in humans such as prostate cancer, breast cancer and cardiovascular disease^1,2,3. In recent years, various ‘omics’ technologies including genomic, transcriptomics, proteomics and metabolomics have grown. These technologies are concerned with comprehensive analysis of metabolites and genes associated with specific diseases. Metabolic pathway analysis has become a successful strategy for understanding large amounts of molecules generated by these ‘omics’ technologies. Methods such as overrepresentation approach and gene set enrichment analysis have been developed and widely applied to identify pathways^{4,5,6,7,8,9,10,11,12,13}. However, most of these methods use gene information only and not the combined effect of metabolite and gene analysis^{4,10,11,12,13,14,15}. Because metabolic pathways involve both metabolites and genes, their dysfunction is closely related to pathogenesis of complex diseases^16,17,18. Thus, integrative pathway analysis of both metabolites and genes would help in interpreting metabolic function and its underlying biological significance. Recently, a metabolic pathway analysis method, IMPaLA, was proposed by Kamburov et al. to identify pathways¹⁶. This method integrated the enrichment significance of pathways based on overrepresentation approach, which was calculated by multiplying the P-values of metabolites and genes, to improve pathway identification performance. However, IMPaLA did not use expression correlations or topology embedded in the pathways, regarding pathways as simple gene sets. Metabolites or genes in a pathway do not always show the same topological importance. Therefore, simply considering the statistical features might not comprehensively reflect the superiority of pathway analysis. Moreover, metabolic pathways are often too large to interpret the relevant biological phenomena accurately. Key subpathway regions of the entire corresponding pathway may be more useful for reflecting the relevant biological importance¹⁹. Many studies have confirmed that abnormal subpathway regions of metabolic pathways were likely to result in human complex diseases^15,20,21.

In our previous study, we developed an analytical method called Subpathway-GM to identify biologically meaningful metabolic subpathways. We analyzed lenient distance similarities of important nodes within metabolic pathways to locate key subpathway regions¹⁹. The results demonstrated that subpathway identification strategy along with pathway topology was able to identify disease-related subpathway regions successfully and reliably. However, Subpathway-GM did not fully consider expression correlation information of metabolites and genes within the pathways. Metabolic pathways are the classical focus in systems biology for studying the dynamic changes of metabolites and genes in a given biological context²². In metabolic pathways, genes and families of enzymes control cellular metabolic phenotypes through the totality of small-molecule metabolites. The biological variation of metabolite concentrations reflects tiny homeostatic adjustments that maintain the metabolome in a steady state²³. In chemical reactions, enzymes catalyze reactions at different efficiencies and rates. The chemical processes controlled by enzymes involve metabolites as substrates, intermediates or end-products. Important concentration changes of some enzymes can lead to important changes in metabolite concentrations²⁴. In addition, metabolite concentration changes in cells could also be controlled by intrinsic genetic alterations. Perturbations of metabolite concentrations indirectly affect the expression of genes encoding enzymes, and expression changes of these genes reflect the topology of the pathway²⁵. In summary, the molecules within the metabolic pathways, including metabolites and genes, influence each other through expression levels or molecular concentrations. The demonstration of the expression correlations in biological systems has opened a new window on metabolic research, allowing monitoring of the effects of cellular perturbations on metabolic pathways²⁶. Therefore, even though specifying the differential expression of metabolites or genes is currently possible, discussing the correlation between expression changes in genes encoding enzymes and changes in metabolite concentrations is pivotal for a more comprehensive picture of metabolic pathways²⁷. In conclusion, we propose that expression correlations of metabolites and genes should be considered to locate key metabolic subpathway regions related to the underlying biological phenomena of complex human diseases. Knowing these correlations would provide significantly more information about metabolic system.

In this study, we proposed a powerful method called Subpathway-CorSP to identify metabolic subpathway regions by integrating expression correlations and topological features of metabolites and genes of interest within pathways. Importantly, we improved on original pathway identification methods by using subpathway identification strategy and emphasizing expression correlations between metabolites and genes of interest based on topological features within the metabolic pathways. First, we integrated differentially expressed (DE) metabolites, DE genes, as well as metabolites and genes with high expression correlation related to the study condition (prostate cancer) into the corresponding compound and enzyme nodes (nodes of interest) within reconstructed metabolic pathway graphs. Then, we considered expression correlations between metabolites and genes of interest based on topological features to locate key metabolic subpathway regions whose internal structures were highly correlated for expression. Finally, we used hypergeometric test to evaluate the enrichment significance of these subpathway regions. Subpathway-CorSP was applied to a prostate cancer data set and its metastatic sub-group data set. We demonstrated that the method was able to identify multiple disease-related subpathway regions robustly and efficiently.

Results

We applied Subpathway-CorSP to the identification of prostate cancer-related metabolic subpathways. The schematic overview of Subpathway-CorSP is in Fig. 1. To demonstrate that Subpathway-CorSP would identify meaningful metabolic subpathways, we compared Subpathway-CorSP with other methods at the system level. Four methods were used to identify pathways for the same data set: Pathway-G, Pathway-M, IMPaLA and Subpathway-GM. These methods are commonly used for pathway identification and have been included in many pathway analysis tools^4,8,9,16. Pathway-G uses only DE genes to identify entire pathways via hypergeometric test⁴. Similar to Pathway-G, Pathway-M uses only DE metabolites^7,8,9. IMPaLA integrates DE metabolites and DE genes to identify entire pathways via hypergeometric test, without considering pathway structure or a subpathway identification strategy¹⁶. Subpathway-GM integrated DE metabolites and genes to identify metabolic subpathways, considering their positions and cascade regions within the given pathway, but without expression correlation information¹⁹.

Subpathway-CorSP identified meaningful metabolic subpathways

We identified 48 DE metabolites by Wilcoxon rank-sum test (P < 0.05) and 1182 DE genes by SAM method (FDR < 0.001) between benign prostate samples (Benign) and prostate cancer samples (PCA+ Mets). Meanwhile, we identified 40 metabolites and 5581 genes with high expression correlation using Pearson correlation coefficient (Pcc) >0.8. We obtained 63 metabolites and 5846 genes of interest by combining a union set of the above metabolites and genes (Table 1). With these nodes of interest, Subpathway-CorSP located 92 potential metabolic subpathways with s = 5 from all metabolic pathways (Supplementary Data set S1). We set the parameter s = 5, similar to the previous Subpathway-GM method¹⁹. This type of subpathway (s ≥ 5) has been reported to be associated with cancer in some studies and is considered to represent a pathway¹⁹. In contrast, a small-scale node set (s < 5) was a scatter node set lacking sufficiently biological significance.

Table 1 The number of metabolites and genes used in Subpathway-CorSP.

Full size table

With a statistical significance level of 0.05, Subpathway-CorSP identified 20 significant metabolic subpathways. These subpathways also corresponded to 20 entire metabolic pathways (Fig. 2A). Of these pathways, 18 (90%) were well reported to be associated with cancer (Table 2). Many pathways identified by Subpathway-CorSP were undetected by Pathway-G, Pathway-M, IMPaLA or Subpathway-GM. Pathway-G located 62 entire metabolic pathways, but did not find any significant pathways at the 5% significance level (Supplementary Data set S1). Thus, all the significant pathways identified by Subpathway-CorSP were not identified by Pathway-G (Fig. 2A). Pathway-M located 42 potential metabolic pathways and identified 13 significant pathways at the 5% significance level (Supplementary Data set S1). However, Subpathway-CorSP identified an additional 15 (75%) not identified by Pathway-M (Fig. 2A). The limited abilities could also be seen in IMPaLA and Subpathway-GM. IMPaLA found 15 significant pathways (P < 0.05) from 32 potential metabolic pathways (Supplementary Data set S1). However, 70% of the 20 significant pathways identified by Subpathway-CorSP were not considered significant by IMPaLA (Fig. 2A). Subpathway-GM identified 29 significant subpathways with a cut-off value of P < 0.05 (Supplementary Data set S1). These subpathways corresponded to 28 entire pathways, 13 of which were identified by Subpathway-CorSP. Of the 20 significant pathways identified by Subpathway-CorSP, 45% were not considered significant by Subpathway-GM (Fig. 2A).

Table 2 The significant subpathways identified by Subpathway-CorSP using prostate cancer data set (P < 0.05).

Full size table

Subpathway-CorSP tended to locate key subpathway regions that mainly considered expression correlation information. Therefore, we tested if the significant subpathways identified showed clearly higher expression correlations. We obtained all Pcc values of metabolite/gene pairs within 20 significant subpathways and calculated those within any single metabolic pathway. The average Pcc value for the 20 significant subpathways was significantly higher than for the corresponding 20 entire pathways (P = 2.861e-06, Wilcoxon rank-sum test) (Fig. 2B). The average Pcc value for the 20 significant subpathways was also significantly higher than for all the metabolic pathways (P = 1.271e-05, Wilcoxon rank-sum test) (Fig. 2B). These results demonstrated that Subpathway-CorSP improved subpathway identification by emphasizing expression correlations between metabolites and genes of interest.

We noticed that up to eight pathways identified by Subpathway-CorSP were simultaneously not detected by Pathway-G, Pathway-M, IMPaLA and Subpathway-GM (Table 2). These pathways either had no significant P-values or were not found by Pathway-G, Pathway-M, IMPaLA and Subpathway-GM. We focused on three of the eight additional pathways that contained both metabolites and genes of interest.

The most significant subpathway (path:00561_1) of the three additional subpathways was in glycerolipid metabolism pathway (Fig. 3). Subpathway-CorSP yielded a P-value of 0.00464, but was not considered significant by Pathway-G, Pathway-M, IMPaLA or Subpathway-GM (P > 0.1). Glycerolipids usually refer to esters generated by esterification of glycerols and fatty acids, including saturated fatty acids and unsaturated fatty acids, which are important in killing tumor cells, inducing cell apoptosis and enhancing cellular and humoral immunity²⁸. Thirty years ago, glycerolipids were investigated biochemically as novel cellular signaling entities. These biomolecules occupy signaling nodes critical to a number of physiological and pathological processes²⁹. Glycerol-based lipids have prominent functions in human physiology and complex diseases from fat storage and metabolic disorders to survival pathways in cancers³⁰. Moreover, the key region (Fig. 4A) where glycerol is converted to triacylglycerol by glycerol kinase, acyltransferase, acylglycerol kinase and other enzymes is closely related to cellular proliferation, carcinogenesis and cancer survival and mortality risk³¹. Lipases in this subpathway region effectively affect prostate cancer cell survival and invasion in vitro and reduce prostate tumor growth³². Compound and enzyme nodes mapped by metabolites and genes of interest were closely located in the core subpathway region of the glycerolipid metabolism pathway. The average Pcc value for the glycerolipid metabolism subpathway was significantly higher than for the corresponding entire pathway (Supplementary Table S1). This result indicated that high expression correlations could help locate important subpathway regions and explain biological phenomena.

The second significant subpathway (path: 00270_2) belonged to a cysteine and methionine metabolism pathway and had a P-value of 0.0153 (Fig. 4B). The corresponding pathway was not considered significant by Pathway-G (P = 0.129), and was not found by Pathway-M, IMPaLA and Subpathway-GM. In this pathway, methionine metabolites, including cystathionine and cysteine, significantly increased the ability to predict aggressive prostate cancer³³. Furthermore, methionine metabolism involves mechanisms for sarcosine formation, and sarcosine is a potential key metabolic intermediary of prostate cancer cell invasion and aggressivity¹⁸. Cystathionine is at the center of the cysteine and methionine metabolism pathway. From a biological point of view, the center of the pathway where nodes of interest locate displays important information. In addition to cystathionine, closely connected enzymes encoded by genes of interest include adenosylhomocysteinase, methyltransferase and methylthioadenosine phosphorylase, which form an approximately circular subpathway region. This subpathway had significantly higher average Pcc value than the corresponding entire pathway (Supplementary Table S1), demonstrating the importance of expression correlations in subpathway identification. Serine, which is converted by galactitol to cystathionine, is also a core metabolite in glycine, serine and threonine metabolism (Fig. 4B). Glycine, serine and threonine metabolism was the most significant metabolic subpathway (a P-value of 0.00024) identified by Subpathway-CorSP and is reported to be highly associated with metastatic prostate cancer¹⁸. In conclusion, Subpathway-CorSP identified prostate cancer-related subpathways and tended to locate key regions effectively, using both expression correlations and topological features between nodes of interest within the metabolic pathways.

The third significant subpathway (path:00360_1) belonged to a phenylalanine metabolism pathway (Supplementary Figure S1). Subpathway-CorSP analysis yielded a P-value of 0.0213 for this subpathway. The corresponding pathway was not considered significant by Pathway-M (P = 0.238), and was not found by Pathway-G, IMPaLA and Subpathway-GM. This subpathway was not at the core region of the corresponding pathway, but is reported to be highly associated with cancer (Table 2). Two metabolites of interest in this subpathway, phenylalanine and tyrosine, are novel potential biomarkers for bladder cancer. These potential biomarkers have diagnostic value and indicate the risk of cancer recurrence³⁴. Other research shows that selective tyrosine and phenylalanine restriction target mitochondria to induce apoptosis of DU145 and PC3 prostate cancer cells³⁵. An enzyme encoded by a gene of interest, aromatic-L-amino-acid decarboxylase, is related to substance dependence of human cancers and functions as a macrophage migration inhibitory factor³⁶. The other enzyme encoded by a gene of interest, primary-amine oxidase, is expressed in mammals and involved in processes such as leukocyte trafficking and glucose metabolism³⁷. These two enzymes might be possible therapeutic targets for cancer treatment. The other methods tended to not detect this kind of pathway because of the low ratio of metabolites and genes of interest involved. However, from the view of expression correlation, these nodes of interest in the phenylalanine metabolism subpathway had significantly higher average Pcc value than the entire corresponding pathway (Supplementary Table S1). Thus, Subpathway-CorSP effectively detected biologically meaningful pathways via emphasizing expression correlations between metabolites and genes of interest within metabolic pathways.

Stability analysis and parameter analysis of Subpathway-CorSP

To validate the stability and reliability of the significant metabolic subpathways, stability analysis was performed. We extracted part of original samples as new data sets, and then identified new significant metabolic subpathways. We tested the number of original significant subpathways that were recalled. The more recalled subpathways, the more stable the subpathways that were identified. The steps were: First, keeping the metabolites and genes invariant, we simultaneously chose 75% of the benign prostate samples (Benign) and 75% of the prostate cancer samples (PCA+ Mets) without replacement from both the metabolomic profile and the matched gene expression profile. This yielded a training data set containing 12 benign prostate samples and 18 prostate cancer samples. Second, with the training data set, we performed Subpathway-CorSP strategy again and identified new significant metabolic subpathways. Third, the recalled number of original significant subpathways was counted, and the training experiment repeated 80 times. At least 12 and at most 18 significant subpathways were recalled (Fig. 5A). The recall rate was about 63.16–94.74%, and the average recall rate was up to 79.55%, although only 75% of the samples were used (Supplementary Table S2). Thus, we considered the significant metabolic subpathways identified by Subpathway-CorSP stable and reliable.

This study focused on identifying key metabolic subpathways by integrating expression correlations and topological features between nodes of interest within pathways. When two nodes of interest had higher Pcc value and smaller shortest path, they had larger CorSP score, which represented their close connection in a pathway. Thus, these two nodes tended to be added to one node set (that was a subpathway), and other nodes at their shortest path were added to the same node set simultaneously. Based on this theory, Subpathway-CorSP identified key subpathway regions in entire metabolic pathways. To set an appropriate threshold, m, for the identification of effective subpathways, we computed CorSP scores for each two nodes of interest in all metabolic pathways. We set m, by experience, to represent the score for 70% CorSP scores passing threshold m for all 150 metabolic pathways (Fig. 5B). The threshold m indicated the minimum permitted correlation extent at the shortest path between two nodes of interest in the pathway. A larger m value indicated that only nodes with higher expression correlations could be added to the same node set. The identified subpathways thus formed a smaller scale than a smaller m. Thus, we identified subpathways closely associated with metabolites and genes of interest. In contrast, decreasing m usually increased the number of other nodes except for nodes of interest within subpathways; it also increased the scale of the subpathways. Thus, we identified as many metabolites and genes associated with nodes of interest as possible.

Subpathway-CorSP provided biologically informative models for metastatic prostate cancer

To evaluate the specific pathological features of metastatic prostate cancer and further demonstrate the use of Subpathway-CorSP, we identified metastatic prostate cancer-related metabolic subpathways. Considering different types of samples that were mixed into a single dataset, we used 12 localized prostate cancer samples (PCA) and 12 metastatic prostate cancer samples (Mets) to identify metabolites and genes of interest. Thus, 74 metabolites and 8765 genes of interest associated with metastatic prostate cancer were obtained (Table 1). Subpathway-CorSP analysis of this sub-group data set detected 103 potential subpathways from all metabolic pathways. Of these, 14 that corresponded to 14 entire pathways were identified at the 5% significance level (Supplementary Data set S1). These pathways were all associated with cancer in the literature (Table 3). Of these pathways, identified by Subpathway-CorSP, 13 were not identified by Pathway-G, 9 by Pathway-M, 8 by IMPaLA and 10 by Subpathway-GM; 5 pathways were simultaneously not detected by the four methods (Table 3).

Table 3 The significant subpathways identified by Subpathway-CorSP using metastatic prostate cancer data set (P < 0.05).

Full size table

Special metabolic pathways only significantly associated with metastatic prostate cancer included purine metabolism (path:00230), the beta-alanine metabolism pathway (path:00410), pyrimidine metabolism (path:00240), the glycosphingolipid biosynthesis-globo series (path:00603) and the riboflavin metabolism pathway (path:00740). In these significant subpathways, nodes of interest showed denser expression correlations than the entire corresponding pathways. To the best of our knowledge, these pathways are not directly related to metastatic prostate cancer. However, the significant subpathway (path:00410_2) identified by Subpathway-CorSP captured a key region of beta-alanine and uracil conversion in the beta-alanine metabolism pathway (Fig. 6). Beta-alanine is a naturally occurring beta-amino acid found in vitamin B5 (pantothenic acid). Some research shows that the isomers of sarcosine, α-alanine and β-alanine, increase significantly as prostate cancer progresses to metastasis and are optimized as a group of potential prostate cancer biomarkers³⁸. A metabolite of interest, uracil, is an effective and well tolerated regimen for hormone-refractory prostate cancer. Combinated with tegafur, uracil is effective in treating patients with prostate cancer³⁹. The enzymes coded by genes of interest such as dihydropyrimidine dehydrogenase (NADP+) and dihydropyrimidinase, together with metabolites of interest such as beta-alanine, uracil and dihydrouracil, form a closely connected subpathway with high average Pcc value. Another significant drug metabolism subpathway (path:00983_4) may be also indirectly related to metastatic prostate cancer. In prostate cancer, chemotherapy drugs prompt the secretion of WNT16B by surrounding fibroblast cells, which activate cellular survival Wnt pathways in prostate cancer cells⁴⁰. Metastasis of prostate cancer to bone increases morbidity. Several classes of drugs and treatments have been developed to interfere with oncogenes and oncoproteins known to be involved in the progression of prostate cancer into a more advanced form of disease⁴¹. Overall, these results suggested that specific metabolic subpathways that may be associated with metastatic prostate cancer were found by Subpathway-CorSP.

Discussion

Metabolic pathways are important for interpreting metabolic functions of pathways related to specific diseases^16,17,18. Integrative analysis of metabolites and genes based on pathway structure help to locate and evaluate key metabolic subpathways, and provide the opportunity to understand the mechanisms underlying cancer pathogenesis. From a biological perspective, dysfunctional genes are closely related to dysfunctional metabolites in pathways. Correlation analysis of them might be useful in helping to identify pathways⁴².

Research has been devoted to pathway analysis based on the idea of correlation. For example, EnrichNet performed a random walk with restart to identify crosstalk between known gene sets and pathway genes under the background of gene interaction networks¹⁰. The DECO algorithm is proposed to remove bias caused by correlation of expression data in gene-set analysis, and effectively improves the prediction accuracy of key pathways¹¹. Wang et al. proposed an extension of the linear combination test, which is used for testing correlations between pathways and correlated multiple continuous phenotypes based on gene expression¹². Using gene expression profiles, Guo et al. illustrated that separate analyses of up- and down- regulated genes identify more pathways¹³. These methods are effective for pathway analysis, but have limitations: First, they focused only on pathway genes, overlooking metabolites with vital functions in pathways. Second, they considered either network topology correlation or gene expression correlation, but not the two aspects simultaneously. Third, they predict pathways at the entire pathway level, but do not predict corresponding subpathway regions that may be more meaningful for reflecting relevant biological importance.

We developed a novel subpathway identification strategy called Subpathway-CorSP that integrated metabolites and genes of interest related to a given cancer into metabolic pathways. We identified key metabolic subpathway regions by systematically considering expression correlations and topological features between metabolites and genes of interest within the pathways. Especially, Subpathway-CorSP was an improvement on Subpathway-GM by emphasizing expression correlations between nodes of pathways. For the input data, Subpathway-GM treated DE genes and metabolites as interesting gene and metabolite sets. However, in Subpathway-CorSP, we integrated DE metabolites, DE genes, as well as metabolites and genes with high expression correlation related to the study condition as metabolites and genes of interest. Although the two methods were both based on expression profiles, Subpathway-CorSP further considered expression correlation information of metabolites and genes, defining metabolites and genes with high Pcc value as important. As a subpathway identification method, Subpathway-GM used a lenient distance similarity of signature nodes to locate subpathways. If the shortest path length between two signature nodes was shorter than n+1, then the two signature nodes and other non-signature nodes were added to the same node set. Thus, from a network-structure point of view, Subpathway-GM fully considered topological correlation between nodes within pathways. However, Subpathway-GM did not account for how metabolites and genes within the pathways correlated at the level of expression. Subpathway-CorSP further considered expression correlations of metabolites and genes based on topological features. When locating subpathways, we continued to apply lenient distance similarity and further improved it using CorSP score, instead of just the shortest path length between two nodes of interest. Specifically, for each pathway that contained nodes of interest, we computed Pcc and the shortest path between each two nodes of interest, and then computed CorSP score. When CorSP score between the two nodes of interest was greater than a given threshold, the nodes and other nodes at their shortest path were added to the same node set.

In this study, Subpathway-CorSP was mainly applied to prostate cancer data set. Many pathways corresponding to significant subpathways identified by Subpathway-CorSP were not detected by the traditional pathway identification methods Pathway-G, Pathway-M, IMPaLA and Subpathway-GM. The different results were mainly attributed to two points. One was the difference in use of the data set. Pathway-G used only DE genes, Pathway-M used only DE metabolites. The other difference was that Subpathway-CorSP considered not only pathway topology, but also expression correlations between metabolites and genes of interest. Both IMPaLA and Subpathway-GM did not use expression correlation information. Our analysis found that the average Pcc value for the 20 significant subpathways was significantly higher than for all the metabolic pathways. This result showed that significant subpathways identified by Subpathway-CorSP had clearly higher expression correlations. These results demonstrated that Subpathway-CorSP improved the identification of subpathways by considering expression correlations.

We focused on three of eight pathways identified by Subpathway-CorSP but not the other methods. We found they were highly associated with prostate cancer, suggesting that Subpathway-CorSP was able to identify more prostate cancer-related metabolic pathways. The stability analysis demonstrated stability and reliability of these significant metabolic subpathways. Flexibility could be introduced to this subpathway strategy by varying threshold m, which could be adjusted by users. As threshold m decreased, the number of the other nodes except for nodes of interest within subpathways increased, and thus the scale of the subpathways increased. Application of Subpathway-CorSP to a metastatic sub-group data set showed that Subpathway-CorSP located potential metastatic prostate cancer-related metabolic subpathways. The codes for Subpathway-CorSP can be downloaded at http://222.170.78.233/Subpathway-CorSP/.

We reported the implementation of a novel subpathway identification strategy to investigate the combined effect of metabolites and genes by integration of expression correlations and topological features within the metabolic pathways. The method had some limitations. Although KEGG is a popular and widely used pathway database resource, some true interactions might be missed when KEGG is used as the single data source⁴³. To avoid biased results towards the single data source, in the future we plan to extend Subpathway-CorSP to a group of integrated pathway databases with topology structure, including KEGG⁴³, WikiPathways⁴⁴ and Reactome⁴⁵. Based on multiple integrated pathway databases, more expression correlation information between nodes of the pathways could be obtained, which would further improve the effectiveness of Subpathway-CorSP for subpathway identification. In addition, some pathways, especially those from different pathway databases, partly overlap. Thus, correlations between nodes of pathways are possible. Although Subpathway-CorSP considered correlations arising from network topology, it did not consider correlations resulting from database design. This is a universal challenging situation in current pathway studies. A new strategy of reducing redundancy in pathways from integrated pathway databases might be needed. In addition, Subpathway-CorSP used hypergeometric test to evaluate pathway significance. This process assumed that pathways were independent of each other and did not consider correlations between nodes of pathways. When integrating multiple pathway databases, hypergeometric test would not be suitable to estimate the statistical significance of pathways. Thus, a new statistical method might need to be used in further research. Subpathway-CorSP has considerable potential for implementation of pathway identification associated with other molecules such as microRNAs or the long noncoding RNAs. This resource could help in studying interactions between these noncoding RNAs and human pathways and exploring crosstalk between pathways, greatly facilitating the understanding of the underlying mechanisms for complex diseases.

Materials and Methods

Data sets

Unbiased metabolomic profile including 16 benign prostate samples adjacent to tumor (Benign), 12 localized prostate cancer samples (PCA) and 12 metastatic prostate cancer samples (Mets) was from Sreekumar et al.¹⁸. With high-throughput liquid and gas chromatography-based mass spectrometry, high-throughput profiling of tissues quantitatively detected 626 metabolites, of which 518 were shared by the three diagnostic classes. These metabolites required a name-mapping step to standardize compound IDs⁴⁶. MetaboAnalyst 3.0 (http://www.metaboanalyst.ca) is a comprehensive tool suite for metabolomic data analysis to convert metabolite names to KEGG compound IDs⁴⁷. It has updated the underlying metabolite library based on the latest version of HMDB⁴⁸. Especially, it has also reimplemented the algorithm to improve the performance of fuzzy string matching. Here, we used MetaboAnalyst 3.0 to perform name-mapping by entering 518 metabolite names. The database returned two kinds of results: query names in normal white indicated exact matches; query names highlighted in red indicated no exact or unique match, with multiple hits possible and users manually selecting the correct match if found (http://www.metaboanalyst.ca/faces/Secure/utils/ConvertView.xhtml). We mapped the 518 metabolites to 100 metabolites with KEGG compound IDs. Matched gene expression profile data (16 Benign, 12 PCA and 12 Mets) was downloaded from the GEO database⁴⁹. The accession number of gene expression profile was GSE8511 from platform GPL1708, containing 12,873 genes after processing.

Methods

Subpathway-CorSP identified metabolic subpathway regions by integrating expression correlations and topological features between metabolites and genes of interest. Metabolic subpathways were identified mainly through: (i) mapping metabolites and genes of interest to graphs of metabolic pathways after graph-based reconstruction of metabolic pathways; (ii) developing subpathway identification algorithm and locating metabolic subpathways within pathways according to nodes of interest; (iii) evaluating the statistical significance of metabolic subpathway regions. The detailed steps are described below.

Map metabolites and genes of interest to graphs of pathways

We constructed a bipartite network to represent metabolic pathways in the KEGG pathway database^50,51. We downloaded KGML files (KEGG Markup Language, http://www.genome.jp/kegg/xml/) of KEGG pathways and converted them to list variables in R by applying iSubpathwayMiner, an R package developed by our research group²⁰. We removed the map node in the corresponding KEGG pathway map and focused on molecules such as compounds and gene products. Resulting graphs mainly contained two types of nodes: compounds and enzymes. Edges were mainly constructed from reactions. Specially, if a compound participated in a reaction as a substrate or product, an undirected edge was used to connect the corresponding compound node to the enzyme node. Thus, substrates of a reaction were connected to enzyme nodes and the enzyme nodes were connected to products. Particularly, enzyme nodes used KEGG Orthology (KO) identifiers to overcome limitations of enzyme nomenclatures. KO identifiers integrated pathway and genome information, which have become a better controlled vocabulary for annotating genes to both metabolic and regulatory pathways⁴³. In this way, positional information for metabolites and enzymes was extracted efficiently and used via a graph model that retained pathway structure.

Based on gene expression data and metabolomic experimental data, we respectively identified DE metabolites and DE genes using Wilcoxon rank-sum test and SAM method, and calculated Pcc for each metabolite-metabolite, metabolite-gene and gene-gene pair. We extracted metabolites and genes with Pcc greater than a cutoff such as 0.8, and regarded them as metabolites and genes with high expression correlation. We defined DE metabolites, DE genes, as well as metabolites and genes with high expression correlation as metabolites and genes of interest. Metabolites and genes of interest were mapped to corresponding nodes within pathway graphs. Notably, metabolites of interest were mapped to corresponding compound (substrate and product) nodes and genes of interest were assigned to KO identifiers and matched to enzyme nodes. Mapped nodes within each pathway graph were defined as nodes of interest.

Locate subpathways according to nodes of interest

We developed a novel subpathway identification strategy to locate metabolic subpathway regions by systematically considering expression correlations and topological features between nodes of interest within given pathways. Specifically, for each pathway that contained nodes of interest, we computed Pcc and the shortest path between each two nodes of interest. A CorSP_ij score was computed to evaluate the expressional and topological influence between the ith and jth nodes of interest. CorSP_ij was defined as:

where, c_ij is Pcc between node i and node j based on their expression profiles, d_ij is the shortest path between node i and node j in any one pathway, calculated by breadth-first search algorithm, and β is a parameter set to balance the relationship between Pcc and the shortest path. In this study, we set β = 10.

We located subpathway regions according to CorSP scores between nodes of interest. There are many feasible distance measures in the area of computer science^52,53. This study, we used the lenient distance similarity developed by our group¹⁹. We verified that the subpathways searched by lenient distance similarity were representative of the entire corresponding pathways for both topological centrality and biological interpretation¹⁹. We further improved lenient distance similarity using CorSP scores, instead of just the shortest path length between two nodes of interest. Specifically, if CorSP_ij between the ith and jth nodes of interest was greater than threshold m, the nodes and other nodes at their shortest path were added to the same node set. This process was recurrently computed for all nodes of interest. We extracted corresponding subgraphs in pathway graphs according to each node set, and defined these subgraphs with node number ≥s as subpathway regions because subgraphs with small scales could not usually form biologically significative subpathways.

Evaluate statistical significance of metabolic subpathways

To evaluate the statistical significance of the located subpathways, hypergeometric tests were used, which required the following values: (i) number of metabolites and genes of interest submitted for analysis; (ii) number of background metabolites and genes; (iii) number of metabolites and genes of interest annotated to each metabolic subpathway; and (iv) number of background metabolites and genes annotated to each metabolic subpathway. All metabolites in the Human Metabolome Database (HMDB)⁴⁸ and KEGG Human Pathway⁴³ were used as background metabolites, and all human genes in KEGG as background genes. Then, the following formula was used to calculate P-value for the enrichment significance of the subpathways:

where, m_m (m_g) was number of metabolites (genes) in the entire metabolome (genome), and n_m (n_g) was number of metabolites (genes) of interest, of which r_m (r_g) metabolites (genes) involved in the subpathway containing t_m metabolites (t_g genes).

Additional Information

How to cite this article: Feng, C. et al. Subpathway-CorSP: Identification of metabolic subpathways via integrating expression correlations and topological features between metabolites and genes of interest within pathways. Sci. Rep. 6, 33262; doi: 10.1038/srep33262 (2016).

References

Liu, W. et al. Topologically inferring pathway activity toward precise cancer classification via integrating genomic and metabolomic data: prostate cancer as a case. Sci Rep 5, 13192 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Cairns, R. A., Harris, I. S. & Mak, T. W. Regulation of cancer cell metabolism. Nat Rev Cancer 11, 85–95 (2011).
Article CAS PubMed Google Scholar
Gluckman, P. D., Hanson, M. A., Buklijas, T., Low, F. M. & Beedle, A. S. Epigenetic mechanisms that underpin metabolic and cardiovascular diseases. Nat Rev Endocrinol 5, 401–408 (2009).
Article CAS PubMed Google Scholar
Huang da, W., Sherman, B. T. & Lempicki, R. A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37, 1–13 (2009).
Article CAS PubMed Google Scholar
Fang, Z., Tian, W. & Ji, H. A network-based gene-weighting approach for pathway analysis. Cell Res 22, 565–580 (2012).
Article CAS PubMed Google Scholar
Pham, L., Christadore, L., Schaus, S. & Kolaczyk, E. D. Network-based prediction for sources of transcriptional dysregulation using latent pathway identification analysis. Proc Natl Acad Sci USA 108, 13347–13352 (2011).
Article ADS PubMed PubMed Central Google Scholar
Chagoyen, M. & Pazos, F. MBRole: enrichment analysis of metabolomic data. Bioinformatics 27, 730–731 (2011).
Article CAS PubMed Google Scholar
Xia, J. & Wishart, D. S. MetPA: a web-based metabolomics tool for pathway analysis and visualization. Bioinformatics 26, 2342–2344 (2010).
Article CAS PubMed Google Scholar
Xia, J. & Wishart, D. S. MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data. Nucleic Acids Res 38, W71–W77 (2010).
Article CAS PubMed PubMed Central Google Scholar
Glaab, E., Baudot, A., Krasnogor, N., Schneider, R. & Valencia, A. EnrichNet: network-based gene set enrichment analysis. Bioinformatics 28, i451–i457 (2012).
Article CAS PubMed PubMed Central Google Scholar
Nam, D. De-correlating expression in gene-set analysis. Bioinformatics 26, i511–i516 (2010).
Article CAS PubMed PubMed Central Google Scholar
Wang, X., Pyne, S. & Dinu, I. Gene set enrichment analysis for multiple continuous phenotypes. BMC Bioinformatics 15, 260 (2014).
Article PubMed PubMed Central Google Scholar
Hong, G., Zhang, W., Li, H., Shen, X. & Guo, Z. Separate enrichment analysis of pathways for up- and downregulated genes. J R Soc Interface 11, 20130950 (2014).
Article PubMed PubMed Central Google Scholar
Wang, J. et al. GO-function: deriving biologically relevant functions from statistically significant functions. Brief Bioinform 13, 216–227 (2012).
Article PubMed Google Scholar
Li, C. et al. Characterizing the network of drugs and their affected metabolic subpathways. PLoS One 7, e47326 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Kamburov, A., Cavill, R., Ebbels, T. M., Herwig, R. & Keun, H. C. Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA. Bioinformatics 27, 2917–2918 (2011).
Article CAS PubMed Google Scholar
Opitz, C. A. et al. An endogenous tumour-promoting ligand of the human aryl hydrocarbon receptor. Nature 478, 197–203 (2011).
Article ADS CAS PubMed Google Scholar
Sreekumar, A. et al. Metabolomic profiles delineate potential role for sarcosine in prostate cancer progression. Nature 457, 910–914 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Li, C. et al. Subpathway-GM: identification of metabolic subpathways via joint power of interesting genes and metabolites and their topologies within pathways. Nucleic Acids Res 41, e101 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, C. et al. SubpathwayMiner: a software package for flexible identification of pathways. Nucleic Acids Res 37, e131 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, X. et al. The implications of relationships between human diseases and metabolic subpathways. PLoS One 6, e21131 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, J. et al. Integrated omics-analysis reveals Wnt-mediated NAD+ metabolic reprogramming in cancer stem-like cells. Oncotarget (2016).
Camacho, D., de la Fuente, A. & Mendes, P. The origin of correlations in metabolomics data. Metabolomics 1, 53–63 (2005).
Article CAS Google Scholar
Haukaas, T. H. et al. Metabolic clusters of breast cancer in relation to gene- and protein expression subtypes. Cancer Metab 4, 12 (2016).
Article PubMed PubMed Central Google Scholar
Ponzoni, I. et al. Pathway network inference from gene expression data. BMC Syst Biol 8 Suppl 2, S7 (2014).
Article PubMed PubMed Central Google Scholar
Madhu, B. et al. Metabolomic changes during cellular transformation monitored by metabolite-metabolite correlation analysis and correlated with gene expression. Metabolomics 11, 1848–1863 (2015).
Article CAS PubMed PubMed Central Google Scholar
Steuer, R. Review: on the analysis and interpretation of correlations in metabolomic data. Brief Bioinform 7, 151–158 (2006).
Article CAS PubMed Google Scholar
Yen, C. L., Stone, S. J., Koliwad, S., Harris, C. & Farese, R. V., Jr. Thematic review series: glycerolipids. DGAT enzymes and triacylglycerol biosynthesis. J Lipid Res 49, 2283–2301 (2008).
Article CAS PubMed PubMed Central Google Scholar
Scott, S. A., Mathews, T. P., Ivanova, P. T., Lindsley, C. W. & Brown, H. A. Chemical modulation of glycerolipid signaling and metabolic pathways. Biochim Biophys Acta 1841, 1060–1084 (2014).
Article CAS PubMed PubMed Central Google Scholar
Prentki, M. & Madiraju, S. R. Glycerolipid metabolism and signaling in health and disease. Endocr Rev 29, 647–676 (2008).
Article CAS PubMed Google Scholar
Wallner, L. P. et al. Quality of preventive care before and after prostate cancer diagnosis. J Mens Health 11, 14–21 (2015).
PubMed PubMed Central Google Scholar
Nomura, D. K. et al. Monoacylglycerol lipase exerts dual control over endocannabinoid and fatty acid pathways to support prostate cancer. Chem Biol 18, 846–856 (2011).
Article CAS PubMed PubMed Central Google Scholar
Stabler, S. et al. Serum methionine metabolites are risk factors for metastatic prostate cancer progression. PLoS One 6, e22486 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Cheng, Y. et al. Metabolomics in bladder cancer: a systematic review. Int J Clin Exp Med 8, 11052–11063 (2015).
PubMed PubMed Central Google Scholar
Liu, X., Fu, Y. M. & Meadows, G. G. Differential effects of specific amino acid restriction on glucose metabolism, reduction/oxidation status and mitochondrial damage in DU145 and PC3 prostate cancer cells. Oncol Lett 2, 349–355 (2011).
Article CAS PubMed PubMed Central Google Scholar
Merk, M. et al. The D-dopachrome tautomerase (DDT) gene product is a cytokine and functional homolog of macrophage migration inhibitory factor (MIF). Proc Natl Acad Sci USA 108, E577–E585 (2011).
Article ADS PubMed PubMed Central Google Scholar
Hu, H., Wang, W., Tang, H. & Xu, P. Characterization of Pseudooxynicotine Amine Oxidase of Pseudomonas putida S16 that Is Crucial for Nicotine Degradation. Sci Rep 5, 17770 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Soliman, L. C., Hui, Y., Hewavitharana, A. K. & Chen, D. D. Monitoring potential prostate cancer biomarkers in urine by capillary electrophoresis-tandem mass spectrometry. J Chromatogr A 1267, 162–169 (2012).
Article CAS PubMed Google Scholar
Hatano, K. et al. Retrospective analysis of an oral combination of dexamethasone, uracil plus tegafur and cyclophosphamide for hormone-refractory prostate cancer. Jpn J Clin Oncol 41, 253–259 (2011).
Article PubMed Google Scholar
Sun, Y. et al. Treatment-induced damage to the tumor microenvironment promotes prostate cancer therapy resistance through WNT16B. Nat Med 18, 1359–1368 (2012).
Article CAS PubMed PubMed Central Google Scholar
Soekmadji, C. & Nelson, C. C. The Emerging Role of Extracellular Vesicle-Mediated Drug Resistance in Cancers: Implications in Advanced Prostate Cancer. Biomed Res Int 2015, 454837 (2015).
Article CAS PubMed PubMed Central Google Scholar
Poisson, L. M., Sreekumar, A., Chinnaiyan, A. M. & Ghosh, D. Pathway-directed weighted testing procedures for the integrative analysis of gene expression and metabolomic data. Genomics 99, 265–274 (2012).
Article CAS PubMed Google Scholar
Kanehisa, M. et al. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34, D354–D357 (2006).
Article CAS PubMed Google Scholar
Pico, A. R. et al. WikiPathways: pathway editing for the people. PLoS Biol 6, e184 (2008).
Article CAS PubMed PubMed Central Google Scholar
Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res 33, D428–D432 (2005).
Article CAS PubMed Google Scholar
Redestig, H. et al. Consolidating metabolite identifiers to enable contextual and multi-platform metabolomics data analysis. BMC Bioinformatics 11, 214 (2010).
Article CAS PubMed PubMed Central Google Scholar
Xia, J., Sinelnikov, I. V., Han, B. & Wishart, D. S. MetaboAnalyst 3.0–making metabolomics more meaningful. Nucleic Acids Res 43, W251–W257 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wishart, D. S. et al. HMDB: a knowledgebase for the human metabolome. Nucleic Acids Res 37, D603–D610 (2009).
Article CAS PubMed Google Scholar
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30, 207–210 (2002).
Article CAS PubMed PubMed Central Google Scholar
Arita, M. The metabolic world of Escherichia coli is not small. Proc Natl Acad Sci USA 101, 1543–1547 (2004).
Article ADS CAS PubMed PubMed Central Google Scholar
Takemoto, K. Current understanding of the formation and adaptation of metabolic systems based on network theory. Metabolites 2, 429–457 (2012).
Article CAS PubMed PubMed Central Google Scholar
Emmert-Streib, F., Dehmer, M. & Shi, Y. Fifty years of graph matching, network alignment and network comparison. Information Sciences s 346–347, 180–197 (2016).
Article MathSciNet MATH Google Scholar
Dehmer, M., Emmert-Streib, F. & Shi, Y. Interrelations of graph distance measures based on topological indices. PLoS One 9, e94985 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China [Grant No. 81572341 and 31401127]; Yu Weihan Outstanding Youth Training Fund of Harbin Medical University; and Harbin medical university scientific research innovation fund [Grant No. 2016JCZX03].

Author information

Feng Chenchen, Zhang Jian and Li Xuecang contributed equally to this work.

Authors and Affiliations

School of Medical Informatics, Daqing Campus, Harbin Medical University, Daqing, 163319, China
Chenchen Feng, Jian Zhang, Xuecang Li, Bo Ai, Meng Li & Chunquan Li
College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, China
Junwei Han & Shang Li
School of Nursing, Daqing Campus, Harbin Medical University, Daqing, 163319, China
Qiuyu Wang
School of Pharmacy, Daqing Campus, Harbin Medical University, Daqing, 163319, China
Taiming Wei
The fifth Affiliated Hospital of Harbin Medical University, Daqing, 163319, China
Yong Xu
Department of Pharmacology, Daqing Campus, Harbin Medical University, Daqing, 163319, China
Chao Song

Authors

Chenchen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xuecang Li
View author publications
You can also search for this author in PubMed Google Scholar
Bo Ai
View author publications
You can also search for this author in PubMed Google Scholar
Junwei Han
View author publications
You can also search for this author in PubMed Google Scholar
Qiuyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Taiming Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Meng Li
View author publications
You can also search for this author in PubMed Google Scholar
Shang Li
View author publications
You can also search for this author in PubMed Google Scholar
Chao Song
View author publications
You can also search for this author in PubMed Google Scholar
Chunquan Li
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

C.F., J.Z. and C.L. designed the research; C.F., J.Z., X.L., S.L. and C.S. performed the research; C.F., B.A., J.H. and Q.W. analyzed the data; C.F., T.W., Y.X. and M.L. wrote the manuscript. All authors reviewed the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Feng, C., Zhang, J., Li, X. et al. Subpathway-CorSP: Identification of metabolic subpathways via integrating expression correlations and topological features between metabolites and genes of interest within pathways. Sci Rep 6, 33262 (2016). https://doi.org/10.1038/srep33262

Download citation

Received: 05 April 2016
Accepted: 24 August 2016
Published: 14 September 2016
DOI: https://doi.org/10.1038/srep33262

This article is cited by

Comprehensive functional enrichment analysis of male infertility
- Seyed Morteza Razavi
- Marjan Sabbaghian
- Ali Salehzadeh-Yazdi
Scientific Reports (2017)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.