Knowledge-guided fuzzy logic modeling to infer cellular signaling networks from proteomic data

Modeling of signaling pathways is crucial for understanding and predicting cellular responses to drug treatments. However, canonical signaling pathways curated from literature are seldom context-specific and thus can hardly predict cell type-specific response to external perturbations; purely data-driven methods also have drawbacks such as limited biological interpretability. Therefore, hybrid methods that can integrate prior knowledge and real data for network inference are highly desirable. In this paper, we propose a knowledge-guided fuzzy logic network model to infer signaling pathways by exploiting both prior knowledge and time-series data. In particular, the dynamic time warping algorithm is employed to measure the goodness of fit between experimental and predicted data, so that our method can model temporally-ordered experimental observations. We evaluated the proposed method on a synthetic dataset and two real phosphoproteomic datasets. The experimental results demonstrate that our model can uncover drug-induced alterations in signaling pathways in cancer cells. Compared with existing hybrid models, our method can model feedback loops so that the dynamical mechanisms of signaling networks can be uncovered from time-series data. By calibrating generic models of signaling pathways against real data, our method supports precise predictions of context-specific anticancer drug effects, which is an important step towards precision medicine.

Existing methods for modeling signaling pathways fall roughly into two categories: data-driven and knowledge-driven. Data-driven methods have emerged as popular tools for systems biology of signaling networks 8,9 . For example, Cai et al. proposed a Bayesian Network (BN) framework to integrate transcriptomic data with phosphoproteomic data to identify the pathways transmitting the signals of diverse stimuli in rat and human cells 10 . However, data-driven methods have a few limitations, such as the requirement of a large dataset for reliable inference and weak interpretability of molecular mechanisms. Knowledge-driven methods collect annotations of biochemical reactions from the literature, and encode them into executable models, typically in the form of Boolean networks 11,12 or ordinary differential equations (ODEs) 13,14 . But current literature covers only parts of the genome-wide signaling network with limited mechanistic detail, and as such it is still impractical to build executable models for large-scale signaling networks with strong explanatory and predictive power.
Some phosphoproteomic datasets of cancer cells measured upon perturbations by chemical inhibitors, stimuli and knock-downs have been published 7,15 , which are valuable resources for uncovering real biochemical reactions and causal relationship among signaling proteins. However, the phosphoproteomic datasets available so far are mostly of small sizes and cover only a few stimuli and inhibitors, which restricts the applicability of data-driven methods. Therefore, some studies proposed hybrid approaches that combine the strengths of both knowledge-driven and data-driven methods. Specifically, using these methods we can manually curate signaling pathways from the literature and databases, use phosphoproteomic data to calibrate the network models and identify drug-induced rewiring of signaling pathways [16][17][18][19][20] . However, these existing hybrid methods do not explicitly model the feedback loops which are prevalent in signaling pathways. While linear regression-based hybrid methods 21 have been developed for gene regulatory network inference, they are not directly applicable to the inference of signaling networks due to the sparse coverage of signaling proteins in the proteomic datasets available.
In this paper we propose a hybrid knowledge-guided network model based on fuzzy logic for signaling pathway inference. To make use of prior knowledge, two regularizers were introduced to impose penalties on violations of prior knowledge regarding individual edges and the maximum number of proteins directly interacting with each signaling protein. In particular, the dynamic time warping algorithm 22 is adopted to measure the goodness of fit between experimental and predicted data, so that our method can model temporally-ordered experimental observations. Compared with existing methods of hybrid modeling, the contributions of our method lie in at least two aspects. First, a new knowledge-fused framework has been proposed which extends the qualitative informative priori proposed in 23 to quantitative priori. Second, we have extended the traditional linear regression used in knowledge-fused methods to time series alignment so that our method can recover signaling pathways with feedback loops and uncover the dynamical mechanisms of signal transduction processes. Extensive empirical experiments have been conducted to analyze the performance improvement by different ratios of prior knowledge mixed with noise on a synthetic dataset. Also, the performance of the proposed model was evaluated on two real datasets, namely a dataset from the DREAM 4 Predictive Signaling Network Modeling Challenge 24 , and the phosphoproteomic and cell fate data of the apoptosis pathway with rewiring induced by sequential treatment of anticancer drugs 15 . The experimental results demonstrate that our model can uncover the signaling pathway alterations that are absent in the prior knowledge network but supported by independent perturbation data. Furthermore, it can successfully infer signaling pathways with feedback loops so that the dynamical mechanisms of cell fate determination can be uncovered from time-series data.

Methods
Our knowledge-guided fuzzy logic modeling for signaling pathway inference consists of four steps. As shown in Fig. 1, we present the flowchart of our method using a toy signaling pathway as an example.
Building prior knowledge network. We first derive quantitative confidence scores from public knowledge bases of pathways to provide more accurate information than qualitative prior knowledge for the inference of the true network structure. The Pathway Commons database contains a large collection of annotations of biochemical reactions from various sources 3 , and each reaction is endorsed by a list of publications. The resource allows us to build a primary network which each edge can be assigned a quantitative confidence score. We adopt a scoring scheme similar to that proposed in ref. 25 to compute the prior confidence scores. In general, publications based on low-throughput techniques, due to a lower false positive rate, are considered to be more reliable than those based on high-throughput techniques, hence we assign higher confidence scores to low-throughput publications than high-throughput publications. First, each publication was classified as a high-throughput or low-throughput study according to the number of unique reactions supported by it. There are 56,193 publications and 4,228,150 reactions included in Pathway Commons, and the number of reactions endorsed in each of 90% publications is less than 111. Therefore, a publication associated with more than 111 reactions would be classified as high-throughput and otherwise low-throughput. Next, if a reaction is endorsed by only one type of study (either high-or low-throughput but not both) we assign confidence score according to Table 1. If a reaction is endorsed by both high-and low-throughput studies, the confidence score is equal to the confidence score of the low-throughput column in Table 1 plus 0.05 times the number of high-throughput endorsements. If the computed value of the confidence score exceeds 0.95, it is still assigned a maximum value of 0.95.
From the primary network with quantitative confidence scores, we extract a subnetwork that covers as completely as possible the receptors of interest, perturbed and measured proteins included in a phosphoproteomic dataset. However, the primary subnetwork may contain some non-identifiable elements that should be removed so that the network calibration against real data can be more efficient. For clarity, we define the manipulated or experimentally measured proteins as "designated" nodes, and other proteins as "undesignated" nodes. We extend the three rules proposed in 16 to compress the primary networks and adjust the confidence scores simultaneously: 1) A signaling cascade consisting of a series of "undesignated" proteins and reactions directed to a "designated" node should be compressed, while all "designated" nodes are reserved in the process of compression; 2) a linear cascade converging to dispatcher-like proteins should be compressed; 3) multiple outgoing signals diverging from a linear cascade in which no protein is manipulated or measured should be compressed. Whenever a node is removed and two tandemly associated edges are merged into one new edge, the smaller of the confidence scores associated with the two original tandem edges would be reserved as the confidence score of the new edge, because the edge with a lower score is the bottleneck of a pathway through which some flux can go. Figure 2 illustrates an example for each compressing rule, as well as the adjustment of confidence scores.
Fuzzy logic network modeling. To generate simulation data from the prior knowledge network, we adopt a fuzzy logic model to describe the effects of input signals on each node (activations or inhibitions). Compared with Boolean logic models, fuzzy logic models can describe the gradual responses and typical sigmoidal biological reactions that can not be modeled by Boolean networks. Here, we adopt the normalized Hill function 26 to transform the input signals to the output signal. Let a real-valued variable ∈ x [0,1] denote a single input signal, then its activation effect can be formulated as where p (> 0) is the Hill coefficient which is an integer and determines the steepness of the sigmoidal transition between the activation and inhibition states, and h (> 0) is the sensitivity parameter which is a real number and determines the midpoint of the Hill function. Note that the value of f(x) falls in the range of [0, 1], and thus the inhibition effect is accordingly defined as 1− f(x). In the case that multiple inputs exist, the AND gate is used if all inputs must be active to activate the output node, and the OR gate is used when the output node would be  activated as long as at least one input node is active. In particular, we give the AND gates a higher priority than the OR gates among multiple logical gates in a logical expression. This priority order corresponds to the disjunctive normal forms (DNF), i.e. Boolean variables or negated Boolean variables comprise a few AND clauses, and the AND clauses are in turn connected by OR operations. In fact, any particular Boolean function can be represented by one and only one disjunctive normal form 27 . Moreover, to extend the DNF to a fuzzy logic model, AND and OR gates are usually substituted by min and max operators so that continuous values can be handled. Given the Hill function and logical operations mentioned above, our goal is to infer the actual logical relationships from the real data. However, the number of possible logical combinations of n input signals is a super-exponential function (2 2 n ), hence it is computationally inhibitive to enumerate all possible logical combinations for even a moderate-sized network. In a specific cellular context, this problem can be simplified by assuming that a regulator can activate or inhibit a target protein alone or by synergizing with a specific set of co-regulators, although in a different cellular context this regulator may collaborate with other proteins. As such, among all possible combinations of multiple regulators, we focus on a subset of logical forms that cover a vast majority of biologically realistic cases. In terms of DNF, this assumption means that a Boolean variable would appear in one and only one AND clause regarding a certain objective variable. As a result, we propose a coding scheme that can represent all such logical combinations narrowed down by the assumption from all possible DNFs.
Formally, a matrix of binary variables R = (r 1 , … , r m ) is used to represent the regulation type (activations or inhibition), in which the element r ij is equal to 1 if the i-th node activates the j-th node, and 0 if the regulation type is inhibition. Another matrix C = (c 1 , … , c m ) is introduced where the j-th column c j = (c 1j , … , c mj ) T is an integer vector encoding the logical relationships among the input signals for the j-th node which is explained in the following. Assume that the value of c ij corresponding to the edge from node i to node j belongs to {1, 2, … , ||b j || 1 }, then for another incoming edge from say node k to node j (i.e. b kj = 1 and k ≠ i), we check whether c kj is equal to c ij . If c kj = c ij , then the AND logical relation is assigned between the two input signals. Note that ||b j || 1 is equal to the number of input signals of node j. Hence, by iteratively assigning each element in c j to an integer value from 1 to ||b j || 1 , we can enumerate all AND clauses. Thereafter, all the AND clauses are linked by OR operations to form the final logical relation. Mathematically, let z j t ( ) be the state of node j on the t-th iteration, and we update its state by calculating the following logical expressions based on the states of its parent nodes on (t− 1)-th iteration: Note that the term in the square bracket is equal to if r ij = 0 (inhibition effect), respectively. As an illustration, we present a few examples of the logical expressions regarding the 6-th node in the toy pathway, as shown in Fig. 1

(b).
Knowledge-guided network inference. In this section, we present a method to calibrate the prior knowledge network by fitting simulated data generated by the fuzzy logic model with real data, thereby inferring context-specific signaling network models. A signaling network is often represented as a directed graph G = (V, E), where the set of vertices (or nodes) V represents a collection of m signaling molecules, and the set of edges E represents the biochemical reactions between the signaling molecules.
the confidence scores of the edges incident on the j-th node (j = 1, 2, … , m hereafter). Let B = (b 1 , … bm) be the adjacency matrix of the graph to be inferred, in which the j-th column b j = (b 1 , … , b mj ) T is a binary vector encoding whether a node is a parent of the j-th node. Assuming  ∈ + ⁎ g j represents the knowledge-derived threshold of the maximum number of incoming edges incident on the j-th node, our knowledge-guided network inference is formulated as the following optimization problem: where n y and n z are the numbers of experimental and predicted samples, respectively. The latter two terms, is the number of incoming edges (i.e. indegree) of the j-th node, are regularizers that impose penalties on violations of prior knowledge and network complexity. The two parameters γ and λ are both non-negative real variables introduced to balance the goodness of fit, violations of prior knowledge and model sparsity.
Mukherjeea and Speed 23 proposed a qualitative informative prior: in which E + is the set of prior edges and E − is the complement edge set of E + . Note that the second term j , which essentially corresponds to Equation (4) and extends the qualitative prior to quantitative prior. The third term imposes a penalty on the j-th node if and only if the number of its incoming edges exceeds the knowledge-derived threshold ⁎ g j . The underlying rationale is that, although the number of regulators of a signaling molecule may be big in a prior knowledge network, the signaling molecule is regulated by a very small number of regulators in a specific cellular context. In fact, limiting the number of incoming edges has been shown to help yield biologically more realistic network models 28,29 . The free parameters γ and λ represent the weights of the two regularizers. Suppose that we hold equal confidence in the two types of priori, γ can be set equal to λ and thus the number of free parameters is reduced from two to one.
Scientific RepoRts | 6:35652 | DOI: 10.1038/srep35652 The calculation of the loss function depends on whether the experimental data are steady-state or dynamical time-series data. If each experimental observation is obtained when a steady state is reached upon a specific perturbation, predictions are made by simulating the virtual perturbations on the candidate signaling network and running the dynamic simulation till a steady state is reached. In such a scenario, the loss function can be formulated as the mean squared error (MSE) between the experimental and predicted data: MSE j j j j 2 2 In some cases, the experimental data consist of a series of temporally-ordered measures of the activity levels of signaling molecules, e.g. experimental observations at 8 discrete time points within 48 hours after the exposure of breast cancer cells to combinatorial drugs 15 . Accordingly, we can simulate the dynamic cellular responses to drug perturbations on the candidate signaling network, whereby time series are generated to describe the dynamic activity levels of signaling molecules. Since the simulated time-series data can be much denser than the experimental data, i.e. n z > n y , the traditional mean squared error loss function is not applicable. Therefore, we exploit the temporal alignment method, dynamic time warping (DTW) 22 , to measure the divergence between the experimental and predicted time series, as illustrated in Fig. 1(c). Formally, let S be the number of steps needed to align the two time series, and let ( ) ( ) be the correspondence indexes representing the aligned compositions between y j and z j , then the loss function is defined as ). The dynamic time warping problem can be efficiently solved by using the dynamic programming algorithm to find the optimal correspondence vectors in linear time and space 30 .
To simulate data from a candidate network, the fuzzy logic model described in Section "fuzzy logic network modeling" is used to compute the state of each node, and generate simulation data for given input signals. According to the type of experimental data, we employ two different configurations to simulate steady-state or temporally-ordered data respectively. For steady-state experimental data, the values of the receptor nodes of each sample is taken as input and we run the simulation till a stable state is reached or a predefined maximum iterations is exceeded, and thus generate equal number of simulation data points as the experimental data. The stable state is judged by checking whether the absolute difference of two successive simulated states is less then a small threshold ε. In our implementation, ε is set to 1.0e-6, and the maximum iterations is heuristically set to 5 times of the number of network nodes.
For temporally-ordered experimental data, simulation data can be generated by successively choosing some nodes to update their states and repeating the update process. In particular, two modes, synchronous and asynchronous, can be used to simulate data from a candidate network. For synchronous mode, the states of all nodes are simultaneously updated at each time point, and the successively generated values constitute the simulated time-series data. For asynchronous mode, only one node are randomly chosen to update its state at each time point. Since asynchronous mode is computationally inhibitive for even moderate-sized networks, we employed the synchronous mode in our implementation. Due to the fact that feedback loops contained in a network would lead the simulation to oscillations, i.e. we can never reach a stationary stable state. Therefore, we run the simulation to generate time-series data and stop the simulation process when the number of time points is three times the experimental number of time points in the real data.
Our main task is to infer the values in the matrices B, C and R from the experimental data. Since all the variables to be inferred are integers, we reformulated the minimization problem defined in Equation (3) into a constrained nonlinear integer programming (NLIP) problem, which is formally defined as follows: in which  + 0 represents the set of non-negative integers. We utilize the genetic algorithm (GA) included in the MATLAB optimization toolbox (MATLAB R2012b release) to solve the NLIP problem, considering that the GA is often able to find the approximate global optimum in most cases. We ran the algorithm 50 times so that the global or nearly global minimum is likely to be found, and the optimum was chosen as the final result. For the Hill coefficient p and activation coefficient h in Equation (1), we empirically set their values using the grid-like search strategy that have been used in previous studies 26,31 . As a result, we empirically set p to 2, and specified h to 0.2, 0.3, … , or 0.8. Such configurations can encode a large number of different models to fit to the real data. After testing the models with other values of p and h, which allows us to infer the parameters in a larger solution space, we did not find that additional configurations could significantly improve the goodness of fit to the real data than the aforementioned subset of the specified values. Data Availability. The source code and datasets are available at the GitHub site: https://github.com/hliu2016/ fuzzy.

Results
Empirical analysis of synthetic data. To evaluate the contribution of prior knowledge to the inference of signaling pathways, we analyzed synthetic data to estimate the effect of varying ratios of the number of prior edges to the total number of edges in a network, as well as different ratios of noisy edges, on the reverse engineering of network structures. The signaling pathway used to generate the synthetic data is shown in Fig. 3(a). It comprises 9 nodes and 10 edges, and there is a negative feedback loop c → h → j ⊣ c. We used the R package of BoolNet 32 , which is an implementation of Boolean network, to produce 5 time-series datasets. Each time-series dataset includes 10 successive states along a trajectory from a random initial state. To mimic the discrete experimental observations, we randomly extracted 6 or 7 states (the initial state was included) from each time-series dataset to build the synthetic data. For each time series, we predicted 10 successive states from the initial state based on a candidate network and then used the dynamic time warping algorithmto align the predicted time series with the synthetic time series.
Prior edges significantly improve performance. We first investigated whether the prior knowledge could improve the performance of our model in predicting time-series data. For this purpose, we used varying ratios (0-100%) of true edges (i.e., prior edges) as prior knowledge to guide the model inference. The ratio of the prior edges is defined as # selected true edges total # true edges . For each ratio, the prior edges are randomly selected from the true network topology, and they are assigned confidence scores randomly generated by drawing from a uniform distribution between 0 and 1. The maximum number of incoming edges for each node was set to 2, and the parameters γ and λ were both set to 0.5. We used the structural distance (SD) between the learned and the true networks, where the SD is defined as the total number of additions and deletions of edges needed to convert the learned network into the true network. As such, smaller values of SD indicate better performance in recovering the true network structure. To avoid potential bias in selecting the prior edges, we repeated the procedure 20 times for each specific ratio. Figure 3(b) shows the boxplot of SD with respect to varying prior ratios. We found that the accuracy was significantly improved when the ratio of prior edges increased. For example, as shown in Fig. 3(b), the mean of SD at 50% prior is significantly less than that at 0% prior (t-test, p-value ≤ 0.0052). These results demonstrate that prior knowledge can significantly improve the performance of signaling pathway inference.

Noise hinders performance improvement by prior knowledge.
To assess the robustness of the contribution of prior knowledge to the performance improvement, we introduced different ratios of false positive edges (i.e., noisy edges). The ratio of noisy edges is defined as # false positive edges total # true edges . The confidence scores for both true and false positive edges were randomly drawn from the uniform distribution between 0 and 1. Similarly, the procedure was repeated 20 times to avoid bias in edge selection. As shown in Fig. 3(c,d), the mean of SD increases gradually with the ratio of noisy edges, indicating that noisy edges can hinder the performance improvement by prior knowledge. Note that, with the ratios of noisy edges near 1.5 in Figs 3(c) and 4 in Fig. 3(d), the mean SD values (≈ 6) corresponding to 50% prior and 100% prior edges respectively are approximately equal to that learned without any prior information or noise (i.e., 0% prior edges and 0% noise), as shown by the red dashed line in Fig. 3(b-d). This observation suggests that the prior knowledge would be helpful in uncovering the true network unless multifold noisy edges are introduced into the prior knowledge, indicating the robustness of the performance improvement by using prior knowledge.
Evaluation on steady-state data. A phosphoproteomic dataset of human liver cell line (HepG2) was used for the DREAM 4 Predictive Signaling Network Modeling Challenge 24 . Based on the dataset, we conducted performance comparison of our method with existing data-driven and hybrid methods. The phosphoproteomic dataset includes 25 samples measured in steady states, and each phosphorylation level was normalized to the range (0, 1) by the logistic transformation proposed in 16 . From the original signaling pathway provided by the DREAM 4 organizers, we derived a compressed network as shown in Fig. 4(a), in which the stimulated nodes are colored in green, the inhibited nodes in red border line and the measured nodes in cyan. We calculated the confidence score for each edge of the simplified network by using the method described in Section "Building prior knowledge network". As shown in Fig. 4(a), the width of each edge is proportional to its confidence score, and the maximum number of incoming edges for each node was set to 3.
Performance comparison. As data-driven methods can only infer the links limited to perturbed and measured proteins which constitute only a subset of the proteins of interest, the resulting networks often have weak biological interpretability. For example, we ran the Bayesian network algorithm MMHC 33 and markov blanket algorithm HITON_PC 34 included in the MATLAB toolbox Causal Explorer 35 to infer the signaling pathway, and found that the resulting networks were difficult to interpret, as they include edges between perturbed and measured proteins and are very different from the original signaling pathway provided by the DREAM 4. Therefore, we focused on the comparison with another hybrid method proposed in ref. 19, which was built on the method with the best performance out of twelve groups participating in the DREAM 4 challenge 8 . Given the prior network with confidence scores as input, we trained our model to fit the experimental data, and the parameters λ and γ were empirically set to 1.2, which achieved a good balance between the LSE (least square error) and the penalty term in Equation (3). The resulting network contains 16 edges, as shown in Fig. 4(b). Moreover, the LSE of each measured protein is presented in Fig. 4(d). Based on the prior network, three edges (IGF1 → AKT, PI3K → ERK12 and IL1a → MEK12) were added to our learned network, and an AND logic gate (MEK12 ^ ¬ IL1a → ERK12) was derived. Compared with the network in Fig. 4(c) as learned in ref. 19, our method achieved better network sparsity and goodness of fit to the real data, which were the two main measures for performance evaluation in the DREAM 4 challenge. For clarity, we converted the LSE into MSE values as defined in ref. 19 using the formula Statistical significance of learned model. To test if the learned model is statistically significant given the experimental data and prior network, we generated a set of random networks. Each random network was generated according to the following rules: 1) it should have the same numbers of nodes and edges as the prior network; 2) each node is incident to at least one edge and each stimulated node has no incoming edge but at least one outgoing edge; 3) the confidence scores are randomly selected from the original prior scores. The randomly generated networks are taken as the "random prior knowledge" to guide the models, and the procedure is repeated 500 times. We found that the random prior networks yielded significantly larger LSE values than the true prior network (z-score = 3.5675, p-value < 0.000183). The results indicate that randomly generated models are unlikely to fit the experimental data with comparable goodness of fit as the knowledge-guided model.

Evaluation on single-cell time-series data.
It has been demonstrated that the triple-negative breast cancer cell line (BT-20) is sensitive to sequential and time-dependent combinatorial drug treatments using the EGFR inhibitor Erlotinib and DNA damage inducer Doxorubicin 15 . It was observed that the BT-20 cells did not show significantly apoptotic response when treated by (1) either drug alone, or (2) both at the same time, or (3) Erlotinib shortly before Doxorubicin. Strikingly, if Erlotinib was added at least 4 hours, especially 24 hours, prior to adding Doxorubicin (denoted as E → D treatment), the cancer cells showed as much as 500% increase in cell death compared to the reverse order of drug usage or using Erlotinib alone. The sequential and time-dependent drug treatment is supposed to rewire the signaling pathways and trigger the signaling cascade of apoptosis.
Therefore, we used our method to detect the alterations of signaling pathways induced by the combinatorial drugs. As shown in Fig. 5(a), the model of the prior network comprises mostly the signaling proteins involved in apoptosis and DNA damage response. In particular, the "DNA damage" node represents the DNA damage signal induced by Doxorubicin. The three pink nodes labeled with "Apoptosis", "Proliferation" and "Cell cycle" represent the cell fates that can be attributed to the activities of the signaling pathways. The single-cell phosphoproteomic dataset published in 15 , which is used here for model calibration, consists of the phosphorylation levels of 20 signaling proteins (cyan nodes) over 8 discrete time points. Moreover, the cell counting corresponding to the three types of cell fates was measured by flow cytometry, which allows to evaluate our method by checking whether the inferred signaling network model can capture the dynamic trends of cell fates induced by the drug treatment.
First, we used the single-cell time-series data of the BT-20 cell line upon the E → D treatment to calibrate the prior network. In particular, we ran the simulations using the fuzzy logic network model (as described in the Methods section) with the synchrnonous updating scheme till a steady state was reached or a predefined maximum number of iterations was exceeded. To conduct the temporal alignment, a sliding window moving along the simulated continuous time-series data was used to extract a fragment of data, which was taken as input by the aforementioned dynamic time warping (DTW) algorithm to align with the experimental data. The size of the sliding window was set to three times the experimental data, i.e. 3 × 8 = 24 time points. The learned network is shown in Fig. 5(b), which contains less edges than the prior network. It is worth noting that the cell fate data were not used in the learning stage, so that the model evaluation in terms of cell fate prediction would be more objective than direct comparisons of phosphorylation levels. For prediction and validation, however, the three cell fate nodes were added into the learned network by linking to them from the upstream signaling molecules according to the prior network. Secondly, we ran simulations to arrive at steady states and then counted the number of truths for each type of cell fate. Because the cell fate nodes had not been included in the fuzzy logic network model for inference, we did not have the parameter values for the edges linking signaling nodes to the cell fate nodes. To reduce the dependency on parameters, we converted the fuzzy logic model to Boolean network for the prediction of cell fates. To this end, we binarized the phosphoproteomic data of the 20 measured signaling proteins as their initial states, and generated random Boolean initial states for other proteins. Then, the initial state was taken to simulate the dynamics of both the prior and trained networks using the BoolNet software 32 . Note that, for dynamic simulations, our learned fuzzy logic network can function as a Boolean network when given Boolean inputs. Corresponding to each of the 8 time points, we repeated the procedure for 10,000 times and recorded the number of truths in steady states for each type of cell fate. Finally, we computed the percentage of each type of cell fate at each time point, and plotted the curves connecting the dots corresponding to the percentages to describe the dynamic changes of cancer cell fates.
As shown in Fig (T1, T2, T3, T4, T5, T6, T7, T8). The predicted cell fates are based on the phosphoproteomic data and thus comprise 8 time points, as shown in Fig. 5(c). Considering potential time delays between the drug-induced changes of signaling pathways and experimentally observed changes of cell fates, we excluded the middle three time points (T2, T3 and T4) of the predicted data and then aligned the rest time points to the experimental data. As a result, we can calculate the correlation coefficients between the experimentally observed cell fates and predicted cell fates. For apoptosis and proliferation, the Pearson's correlation coefficients were 0.8968 vs. 0.7292 and 0.8698 vs. 0.7902 for the trained network vs. prior network, respectively. For cell cycle, the trained network and prior network achieved about equal Pearson's correlation coefficients of 0.6352 and 0.6186. The results indicate that our method effectively calibrated the prior network model by using the phosphoproteomic time-series data, thereby making more accurate predictions of cancer cell fates.

Conclusion and Discussion
Although purely data-driven methods have achieved remarkable success in modeling biological networks, the results are often difficult to interpret in terms of molecular mechanisms. With the accumulation of domain knowledge, we expect to integrate the knowledge and experimental data to produce executable mathematical models, and thereby uncover hidden mechanisms of cellular phenotypes. We have high confidence in the domain knowledge and expect that the models calibrated against real data would not diverge too much from the Scientific RepoRts | 6:35652 | DOI: 10.1038/srep35652 knowledge-based model. In particular, we extended the qualitative informative priori proposed by Mukherjeea and Speed 23 to quantitative priori. In many situations, our confidence in prior knowledge is not entirely definite (e.g. true or false), but often uncertain. For instance, literature-derived evidence for different phosphorylation reactions in a specific cancer cells are often not equally strong, i.e. some reactions are strongly supported while other are only moderately supported by previous findings. Therefore, real-valued fuzzy confidence scores (ranging from 0 to 1) of prior knowledge possesses more descriptive power than binary prior information.
To the best of our knowledge, we are the first to introduce the time-series alignment technique into the signaling pathway inference. As such, our method can handle not only steady-state experimental data, but also temporally-ordered experimental data, whereas previous studies can only address steady-state experimental data 8,26,31 . Furthermore, the capacity of handling time-series data allows us to recover signaling pathways with feedback loops and reveal dynamical mechanisms of signal transduction. It is known that feedback loops in a network often lead to oscillatory states which is beyond the capacity of previous methods that depend on linear regression of steady-state data.
As fuzzy models can describe the gradual responses and typical sigmoidal biological reactions, we adopted the fuzzy model rather than Boolean model. The normalized Hill function is widely used in kinetic modeling of biochemical reaction networks and its two parameters p and h are simply determined by a grid-like search used in previous studies 26,31 . To verify the validity of the search strategy, we explored the impact of p and h on the MSE, using the DREAM 4 dataset (for more details see subsection titled "Evaluation on steady-state data"). To capture typical sigmoidal biological reactions, the value of p should be greater than 1, but a large value of p would lead to  , h), the optimal MSE was obtained by running the genetic algorithm for 50 times. As shown in Fig. 6, the MSE decreases gradually with the increase of h, but the MSE rises rapidly when h is greater than 0.8 except for the linear transfer (p = 1). Hence, we specified the value of h to 0.2, 0.3,… , 0.8, which can be optimized along with other variables by the genetic algorithm. To capture typical sigmoidal biological reactions, p was set to 2. Besides, it looks that the magnitudes of impacts on MSE of the two parameter p and h are much less than the model parameter λ, as shown in Fig. 4(d). Therefore, we focus on adjusting the value of λ to balance the MSE and the penalty on the violation of prior knowledge in our empirical experiments.
An important objective of our paper is to construct cellular context-specific network models for cancer cell fate decisions, especially apoptosis induced by drug treatments. The hybrid model proposed in this paper is an important step toward this goal, by calibrating generic models of signaling pathways against quantitative phosphoproteomic data measured in cancer cells. However, in real cancer cells the rewiring of signaling pathways must undergo the transcriptional programs and metabolic processes before the induced cell fates can be observed, i.e., there is a time delay between the drug-induced changes of signaling pathways and experimentally observed changes of cell fates. Since the predicted cell fates in this paper were directly based on the signaling networks, there is a sharp rise of the proportion of dead cells, as shown in Fig. 5(c), whereas the experimentally observed proportions of cells committing suicide (e.g. apoptosis) change gradually and smoothly, probably due to the time lags from the signal transduction to phenotypic changes. Therefore, we plan to link the signaling pathways and transcriptional regulatory networks, thereby taking the time delays into account. Moreover, we will incorporate genomic and transcriptomic data of individual patients, such as somatic mutations, copy number variations and gene expressions data, to further calibrate the models, so that it can precisely predict the context-specific anti-cancer drug effects, thereby help pave the way for the precision medicine.