Signaling pathways are a cornerstone of systems biology. Several databases store high-quality representations of these pathways that are amenable for automated analyses. Despite painstaking and manual curation, these databases remain incomplete. We present PATHLINKER, a new computational method to reconstruct the interactions in a signaling pathway of interest. PATHLINKER efficiently computes multiple short paths from the receptors to transcriptional regulators (TRs) in a pathway within a background protein interaction network. We use PATHLINKER to accurately reconstruct a comprehensive set of signaling pathways from the NetPath and KEGG databases. We show that PATHLINKER has higher precision and recall than several state-of-the-art algorithms, while also ensuring that the resulting network connects receptor proteins to TRs. PATHLINKER’s reconstruction of the Wnt pathway identified CFTR, an ABC class chloride ion channel transporter, as a novel intermediary that facilitates the signaling of Ryk to Dab2, which are known components of Wnt/β-catenin signaling. In HEK293 cells, we show that the Ryk–CFTR–Dab2 path is a novel amplifier of β-catenin signaling specifically in response to Wnt 1, 2, 3, and 3a of the 11 Wnts tested. PATHLINKER captures the structure of signaling pathways as represented in pathway databases better than existing methods. PATHLINKER’s success in reconstructing pathways from NetPath and KEGG databases point to its applicability for complementing manual curation of these databases. PATHLINKER may serve as a promising approach for prioritizing proteins and interactions for experimental study, as illustrated by its discovery of a novel pathway in Wnt/β-catenin signaling. Our supplementary website at http://bioinformatics.cs.vt.edu/~murali/supplements/2016-sys-bio-applications-pathlinker/ provides links to the PATHLINKER software, input datasets, PATHLINKER reconstructions of NetPath pathways, and links to interactive visualizations of these reconstructions on GraphSpace.
A major focus in systems biology is the identification of the networks of reactions that guide the propagation of cellular signals from receptors to downstream transcriptional regulators (TRs). Over the past two decades, databases have been developed to store the interactions present in signaling pathways,1,
Inspired by these challenges, we sought to develop a computational approach to automatically reconstruct signaling pathways from a background network of molecular interactions (the interactome). We conceptualized the problem as follows (Figure 1): given as input only the receptors and the transcription factors/regulators (TRs) in a specific signaling pathway, can we analyze the interactome to recover the pathway with high accuracy? Several earlier methods have addressed a computationally similar problem of connecting a set of sources or “causes” (akin to receptors) to a set of targets or “effects” (akin to TRs) through a compact sub-network of the interactome.6,
We develop PATHLINKER, an algorithm that satisfies both criteria. PATHLINKER finds the k highest scoring paths from any receptor to any TR, where k is a user-defined parameter (Figure 1). As the value of k increases, the solution smoothly increases to capture more interactions in the curated pathways. By design, every interaction in the reconstruction lies on some path from a receptor to a TR. Thus, PATHLINKER satisfies both criteria for a reconstruction algorithm.
We apply PATHLINKER to a comprehensive set of 15 signaling pathways in the NetPath database3 and 32 pathways in the KEGG database,5 both of which are manually curated. Compared with several other approaches,15,
We first evaluated the ability of PATHLINKER and other algorithms to reconstruct a diverse collection of 15 signaling pathways in the NetPath database (Supplementary Section S1). We then experimentally validated a novel prediction from PATHLINKER on the Wnt signaling pathway.
Pathway reconstructions from the NetPath database
Comparison to other algorithms
We compared PATHLINKER with six other network-based algorithms (Table 1), including shortest path (SHORTESTPATHS, BOWTIEBUILDER19), random walk with restarts (RWR20), network flow (RESPONSENET17), Steiner forest (PCSF15), ANAT,18 and a greedy seed-based method (Ingenuity Pathway Analyzer (IPA16). Brief descriptions of these methods and the user-defined parameters we selected appear in Supplementary Section S2.
For each pathway reconstruction, we used the interactions in the NetPath pathway as the set of positives and a subsampled set of interactions not present in the NetPath pathway as the set of negatives (Supplementary Section S3). For each algorithm, we aggregated the reconstructions of these pathways to measure the precision and recall (Figure 2a and Supplementary Section S3). We observed that ANAT, PCSF, RESPONSENET, SHORTESTPATHS, and BOWTIEBUILDER achieved values of recall <0.1. While IPA returned sub-networks with larger recall values, the precision was never above 0.2. RWR achieved the best precision for recall values between 0.05 and 0.13, and PATHLINKER and RWR were comparable for all other values of recall.
To determine the source of the false positive interactions in PATHLINKER compared with RWR, we asked if the false positives were “close” to the pathway as represented in the NetPath database. First, we recomputed precision of all algorithms after ignoring interactions that involved at least one true positive node in the NetPath pathway (“pathway-adjacent negatives”) before subsampling the negatives (Figure 2b). This modification increased the precision for all the algorithms, with PATHLINKER clearly dominating all the other methods at values of recall between 0.2 and 0.6. To further investigate this trend, we computed each interaction’s distance from any protein in the pathway, where a distance of zero indicated a true positive and a distance of one indicated a pathway-adjacent negative (Figure 2c and Supplementary Section S3). At a recall of 0.2, RWR contained a larger proportion of true positives (purple regions) than PATHLINKER, while the proportion of true positives was similar at recall 0.4 and 0.6. However, the larger proportion of interactions that were at a distance of 1 from the pathway (dark blue regions) across all three values of recall indicates that PATHLINKER’s false positives were closer to the pathway than RWR’s false positives.
To compare PATHLINKER and RWR using the criterion where we required receptors and TRs to be connected in the reconstruction, we assessed how quickly PATHLINKER and RWR recovered the curated receptors and TRs. For PATHLINKER and RWR, we recorded the index of the first interaction that contained each receptor or each TR. Figure 2d shows the results for the first 1,000 ranked interactions, and Supplementary Figure S2 shows the full ranking. PATHLINKER and RWR recovered receptors at about the same rate, although PATHLINKER’s long tail indicated that the last few receptors were difficult for PATHLINKER to retrieve. Conversely, PATHLINKER successfully recovered 90% of the TRs in the pathways in the first 1,000 ranked interactions, compared with only 38% recovered by RWR.
Evaluation of PATHLINKER’s performance
We assessed PATHLINKER’s performance in several additional ways to investigate its robustness to the inputs and its effectiveness for other pathway databases. First, we added (incorrect) receptors/TRs to the input or removed correct receptors/TRs from the input and compared the resulting reconstructions (Figure 2e and Supplementary Section S3). When we deleted 30% of the receptors and 30% of the TRs from the input, the mean precision at recall of 0.3 and 0.6 dropped by 11% (from 0.42 to 0.38) and 27% (from 0.28 to 0.22), respectively, compared with the precision values with the correct inputs (Supplementary Figure S3). The results were similar for random additions of 30% of the receptors and 30% of the TRs.
Second, we evaluated the performance of recovering proteins in the reconstructions. At similar values of recall, PATHLINKER’s precision for protein recovery was much higher than that for interaction recovery (Figure 2f). In fact, the precision values of all algorithms improved considerably (comparing Figures 2a,b with Supplementary Figure S4). When excluding proteins that have an interaction with at least one protein in the pathway, all algorithms have nearly perfect precision (Supplementary Figure S4).
Our analysis thus far relied on 15 pathways from a single database. Our last three assessments estimated the effect of interactions present only in NetPath and extended the scope of the analysis to a larger set of NetPath pathways and to the KEGG database. First, we estimated the reliance of our reconstructions on NetPath-only interactions by applying PATHLINKER to an interactome that excluded these interactions. Only 4% of the interactions in the interactome were present in at least one NetPath pathway; further, 35% of these interactions were supported solely by NetPath (Supplementary Table S5). To evaluate the resulting reconstruction, we used the 65% of NetPath interactions that remained in the interactome as positives. While the proportion of positives in the interactome dropped from 4% to 2.6%, PATHLINKER’s performance was comparable to that in the original interactome (Supplementary Figure S5). Next, we applied PATHLINKER and RWR to an expanded set of 29 NetPath pathways that contained at least one receptor and at least one TR, i.e., we removed the criterion that at least three paths should connect receptors to TRs in each pathway. We observed similar trends in performance on the expanded set as on the original set of 15 pathways (Supplementary Figure S6). When we ignored pathway-adjacent negatives, the precision of the reconstructions for the expanded set was smaller than for the original set. Nevertheless, PATHLINKER still clearly dominated over RWR (Supplementary Figure S6). Finally, we assessed the performance of PATHLINKER on another signaling pathway database. Accordingly, we computed aggregate precision and recall over the reconstructions of 32 KEGG signaling pathways that contained at least three paths from receptors to TRs, removing disease pathways from consideration (Figure 2g). The aggregate precision-recall curves for NetPath and KEGG pathways were comparable, with PATHLINKER performing slightly better on NetPath pathways at very low (<0.05) and high (>0.4) values of recall.
Wnt pathway reconstructions
We visualized the topologies of the Wnt pathway reconstructions from the PATHLINKER, RWR, and IPA at a recall of 0.20 (Figure 3a and Supplementary Table S6). We selected these three methods since every other approach achieved a recall of at most 0.13 for the Wnt pathway reconstructions (Supplementary Figure S7). In addition to the true positive interactions from NetPath (green edges), all three reconstructions contained interactions that are present in KEGG but missing from NetPath (purple edges). IPA had a slightly higher precision than PATHLINKER and RWR; however, the reconstruction contained 13 connected components, and only 3 TRs were connected to receptors. RWR’s reconstruction contained two connected components and only two TRs. In contrast, PATHLINKER produced a reconstruction with many receptor-to-TR paths that contain NetPath and KEGG interactions, including 10 of the 13 TRs.
To more carefully explore the highest ranked paths in the PATHLINKER reconstruction, we examined the network formed by the top 200 paths computed by PATHLINKER using the receptors and TRs in the Wnt pathway in NetPath (Figure 3b). For this analysis, we added two receptors that were missing from the earlier precision-recall analysis (Supplementary Section S1). The PATHLINKER network included 16 proteins not previously known to be in the NetPath or KEGG representations of the Wnt pathway (gray or orange nodes in Figure 3b). Fifteen of these proteins are either involved in Wnt crosstalk, have been shown to be involved in β-catenin signaling in non-human models, or are involved in general post-translational protein modifications (Supplementary Section S5).
The remaining protein, CFTR, was the highest ranked of all proteins not previously known to be in Wnt pathway in the NetPath or KEGG databases. It appeared in the 59th path computed by PATHLINKER (Figure 3b). PATHLINKER indicated that CFTR acted as a signal transducer from Ryk, a receptor tyrosine kinase involved in Wnt signaling and organismal development,21,
Exploring the role of CFTR in Wnt signaling
We designed a series of experiments to determine the role of Ryk, CFTR, and Dab2 in Wnt/β-catenin-mediated signaling as predicted by PATHLINKER (blue region in Figure 3b). We utilized a quantitative TCF/LEF luciferase reporter assay and measurement of cellular β-catenin levels to determine if silencing of Ryk, CFTR, or Dab2 has a specific effect on Wnt/β-catenin signaling. We employed the Wnt plasmid library28 to transiently express 11 different secreted Wnt proteins (referred to hereby as Wnt) in HEK293 cells. Transient expression of Wnts has been previously shown to induce the expression of luciferase enzyme driven by a synthetic, tandem TCF/LEF promoter when co-transfected into HEK293 cells.28 We were able to determine and verify the extent of TCF/LEF- promoted luciferase activity by each of the 11 Wnt proteins tested (Figure 4a). Transient expression of Wnt 1, 2, 3, and 3a resulted in robust TCF/LEF-promoted luciferase activity (⩾30-fold), while Wnt 2b2, 6, 7a, 7b, 8a, 9b, and 10b promoted such activity to a much lesser extent (<30-fold) in comparison to control samples not treated with Wnt.
We then determined the efficacy of transient silencing of CFTR, Dab2, and Ryk by siRNA in HEK293 cells via western blot in a dose-dependent manner (Figure 4b). In the No Wnt control cells, cellular levels of β-catenin were not noticeably perturbed by siRNA silencing of CFTR and Ryk, but increased as cellular protein levels of Dab2 decreased. In these No Wnt control cells, we determined there were no significant changes in TCF/LEF-promoted luciferase activity in the absence of Ryk, Dab2, or CFTR. In the absence of Dab2 or CFTR, both TCF/LEF-promoted luciferase activity (Figures 4c,d) and β-catenin levels determined by western blot (Figure 4e) significantly increased for cells stimulated by nearly all Wnts (the exception being Wnt2b2 in the absence of Dab2 for measurement of β-catenin) in comparison to control scrambled siRNA-treated cells. Conversely, in the absence of Ryk, there was (i) significant ablation in TCF/LEF-promoted luciferase activity and (ii) decreased levels of cellular β-catenin in the presence of only Wnt 1, 2, 3, or 3a in comparison to control scrambled siRNA-treated cells. We noted no significant difference of TCF/LEF-promoted luciferase reporter activity or levels of cellular β-catenin for cells expressing Wnt 6, 7a, 7b, 8a, 9b, 10b in comparison to the control scrambled siRNA.
Cellular β-catenin levels determined by western blot were in accord with the activation of TCF/LEF promoter when stimulated by the respective Wnt (Figure 4e). In the presence of a stimulatory Wnt (specifically Wnt 1, 2, 3, and 3a), an increase in β-catenin levels in comparison to the No Wnt Control correlated with increased TCF/LEF-promoted luciferase activity (Figure 4f and Supplementary Figure S9). In instances where normalized relative luminescence was ablated, quantification of β-catenin was marginal or diminished as well (Figure 4f).
Utilizing endogenous CFTR as a bait, we were able to co-immunoprecipitate both Ryk and Dab2 in No Wnt control cells (Figure 4g). These interactions were qualitatively diminished in HEK293 cells transiently expressing Wnt 1, 2, 2b2, 3, and 3a. We hypothesize that Wnt-mediated receptor endocytosis triggers CFTR to the degradation pathway rather than membrane recycling, resulting in decreased cellular levels of CFTR and potentially Ryk and Dab2. Further studies on cellular trafficking of the Ryk–CFTR–Dab2 complex will provide insight into these results.
Reconstructing multiple pathways
We have considered two distinct types of algorithms: those that returned a single sub-network, producing a point on the precision-recall curve (SHORTESTPATHS, RESPONSENET, PCSF, and ANAT, BOWTIEBUILDER, and IPA and those that provided a ranked list of interactions, producing precision-recall curves (PATHLINKER and RWR). In the case of IPA, since changing parameters yielded networks with substantially different precision and recall, we present results for this algorithm for nine parameter values. Since the single sub-network approaches had the goal of computing compact sub-networks that connected sources to targets, they were able to reconstruct pathways with high precision but only with low recall. Only the algorithms that offered a ranked list of interactions, PATHLINKER and RWR, reached a recall of ⩾0.6. These results showed that an important component of a pathway reconstruction algorithm was a parameter, such as k, whose increase caused a smooth variation and expansion of the resulting network. While both RWR and PATHLINKER had this property, only PATHLINKER offered an additional guarantee of connecting receptors to TRs (Figure 2d and the networks in Figure 3a). We conclude that PATHLINKER reconstructions captured the structure of signaling pathways much better than IPA and RWR, despite comparable performance in terms of precision and recall.
Several previous studies have focused on recovering only the proteins within a pathway, a methodology commonly used to predict the biological processes of which a protein may be a member.29 All algorithms improved considerably when evaluating the proteins in the pathway reconstructions (Figure 2f), demonstrating that reconstructing the interactions within a pathway is a more challenging problem than that of recalling the proteins in the pathway. In addition, false positive interactions in reconstructions that are “near” the curated pathway may indeed represent valid interactions that have not yet been added to the pathway through the curation process (Figures 2b,c). High-confidence predictions adjacent to the pathway may be ideal candidates for further experimental studies aimed at expanding known signaling pathways.
Novel role of the Ryk–CFTR–Dab2 path in Wnt/β-catenin signaling
Wnt proteins are essential components of higher order eukaryotic development, cellular homeostasis, and wound healing. The canonical Wnt signaling pathway has been shown to be specific for a subset of Wnts, while other Wnts are known to signal through alternate means (reviewed in the study by MacDonald et al.30). Using 11 of the 19 known Wnts, we further this understanding by showcasing how the tested Wnts differentially activate the TCF/LEF promoter via β-catenin to significantly varying degrees. We show that Wnts 1, 2, 3, and 3a are capable of ≥30-fold activation of the TCF/LEF promoter, and do so in part via a novel Ryk–CFTR–Dab2 pathway that further regulates the cellular levels of β-catenin.
Ryk is a predicted tyrosine-protein kinase containing an extracellular WIF domain that has been previously shown to directly bind to Wnt 1 and Wnt 3a, though its signaling mechanism was unknown.23 Silencing of Ryk by siRNA in mice results in defects in axon guidance and neurite outgrowth in response to Wnt 3a induction.22 The interaction between Ryk and CFTR was first determined in the CFTR interactome30 and was not directly pertinent to the study’s dissection of the Hsp90 co-chaperone, Aha1, and CFTR interaction.31 We validated the Ryk–CFTR and CFTR–Dab2 interaction via co-immunoprecipitation. CFTR functions intrinsically as a membrane chloride ion channel protein and known point mutations result in impaired functionality resulting in the clinical manifestation of cystic fibrosis.32 CFTR is impacted by intracellular calcium (reviewed in the study by Antigny et al.33), an alternate product of certain non-canonical Wnt signaling pathways.33,34 Dab2 is involved in endosomal recycling and degradation of CFTR and is a well-known regulatory component of receptor-mediated endocytosis.35,36 Dab2 also functions as a negative regulator of the β-catenin destruction complex.26,37,38 Even though prior groups had previously identified these functionalities independently, there was no evidence or speculation for the role of CFTR in Wnt/β-catenin-mediated signaling particularly by Ryk or Dab2.
We present a model incorporating the Ryk–CFTR–Dab2 pathway as an amplifier of Wnt 1-, 2-, 3-, and 3a-specific β-catenin signaling (Figure 5). Our results suggest the recruitment of Dab2 to the Ryk–CFTR membrane complex in the presence of specific Wnt proteins. This process further impedes the formation of the β-catenin destruction complex, thereby freeing additional β-catenin to further amplify TCF/LEF promoter transcription. It is currently unknown if Wnt signaling via Ryk modifies the sodium transport function of CFTR in preparation for context specific cellular processes or if Wnt-specific signaling facilitates the degradation of CFTR. Further molecular characterizations are required to provide insight into the novel role of CFTR in facilitating Wnt 1-, 2-, 3-, and 3a-specific signaling.
In conclusion, we have presented PATHLINKER, an algorithm that automates the reconstruction of human signaling pathways by connecting the receptors and TRs for a pathway through a physical and regulatory interaction network. Based on our comprehensive analysis on 15 NetPath pathways, PATHLINKER achieved much higher recall (while maintaining reasonable precision) than several other methods. Furthermore, it was the only method that could control the size of the reconstruction while ensuring that receptors were connected to TRs in the result. PATHLINKER’s reconstruction of the Wnt pathway indicated that CFTR facilitates the signaling from Ryk to Dab2. In HEK293 cells, we validated this path experimentally and showed its specificity for 4 of the 11 Wnts tested (Wnt 1, 2, 3, and 3a). Based on these results, we propose a model that suggests Dab2 is recruited to the Ryk–CFTR membrane complex in response to a defined Wnt stimulus that ultimately amplifies Wnt 1, 2, 3, and 3a canonical signaling. In summary, PATHLINKER provides a promising framework for reconstructing a well-studied signaling pathway given relatively little information about its components. It may serve as a powerful approach for discovering the structure of poorly studied processes and prioritizing both proteins and interactions for experimental study.
Materials and methods
The problem of pathway reconstruction takes as input (i) a weighted directed interactome G containing physical and regulatory interactions between pairs of proteins, (ii) the receptors S in a signaling pathway of interest, and (iii) the TRs T in the same pathway. A reconstruction of a pathway P consists of a sub-network of G that connects the receptors in P to the TRs in P using proteins and interactions in G.
Given an interactome G=(V, E), where every edge e in E has an associated weight we between 0 and 1, a receptor set S, a TR set T, and a user-defined parameter k, PATHLINKER computes the k highest scoring loopless paths that begin at any receptor in S and terminate at any TR in T. We define the score of a path to be the product of the edge weights along the path. We add an artificial source s with a directed edge (s, x) for each node x∈S and an artificial sink t with a directed edge (y, t) for each node y∈T. We assign the following cost to each edge (u, v): Let the cost of a path be the sum of the costs of the edges in the path. Therefore, the least costly s↝t path is equivalent to the path from S to T that maximizes the path score. PATHLINKER computes the k highest scoring paths in this modified graph by incorporating a novel integration of Yen’s algorithm39 with the A* heuristic (Supplementary Section S6). This technique is up to 41 times faster than Yen’s algorithm by itself (Supplementary Figure S8) and is thus capable of handling the complexity of human interaction networks and signaling pathways.
We compute a pathway reconstruction Gk for each value of k by taking the union of the k highest scoring paths. By construction, the interactions in the k shortest paths are a subset of those in the (k+1) shortest paths, thereby ensuring that our reconstructions vary smoothly with k. For precision and recall calculations, we compute k=20,000 paths and rank each node and edge by the index of the first path in which it appears. This value of k reflects the high degree of redundancy (edge reuse) among paths in signaling networks.
We constructed a directed human protein interactome from numerous protein–protein interaction and signaling pathway databases.3,
The National Institute of General Medical Sciences of the National Institutes of Health grant R01-GM095955 (TMM), National Science Foundation (NSF) grant DBI-1062380 (TMM), National Research Service Award F32-ES024062 (ANT), and Environmental Protection Agency grant EPA-RD-83499801 (TMM) supported this work. We also acknowledge funding from the ICTAS Center for Systems Biology of Engineered Tissues at Virginia Tech (TMM). We thank Brendan Avent and Peter Burnham for their help in preparing the manuscript.
About this article
Journal of the Indian Institute of Science (2017)