A Mathematical Model of the Phosphoinositide Pathway

Phosphoinositides are signalling lipids that constitute a complex network regulating many cellular processes. We propose a computational model that accounts for all species of phosphoinositides in the plasma membrane of mammalian cells. The model replicates the steady-state of the pathway and most known dynamic phenomena. Sensitivity analysis demonstrates model robustness to alterations in the parameters. Model analysis suggest that the greatest contributor to phosphatidylinositol 4,5-biphosphate (PI(4,5)P2) production is a flux representing the direct transformation of PI into PI(4,5)P2, also responsible for the maintenance of this pool when phosphatidylinositol 4-phosphate (PI(4)P) is decreased. PI(5)P is also shown to be a significant source for PI(4,5)P2 production. The model was validated with siRNA screens that knocked down the expression of enzymes in the pathway. The screen monitored the activity of the epithelium sodium channel (ENaC), which is activated by PI(4,5)P2. While the model may deepen our understanding of other physiological processes involving phosphoinositides, we highlight therapeutic effects of ENaC modulation in Cystic Fibrosis (CF). The model suggests control strategies where the activities of the enzyme phosphoinositide 4-phosphate 5-kinase I (PIP5KI) or the PI4K + PIP5KI + DVL protein complex are decreased and cause an efficacious reduction in PI(4,5)P2 levels while avoiding undesirable alterations in other phosphoinositide pools.


Background
We assume that phospholipids may move freely across the cell membrane and that phosphoinositides are in the inner leaflet of the plasma membrane, as reported by van Meer 1 and Fadeel 2 . Model reactions take place at the inner leaflet of a 1 μm 2 patch of plasma membrane and in an adjacent region of the cytoplasm, with a height of 0.01 µm. We assume that influxes and effluxes of lipids by diffusion from or to adjacent patches are balanced. Within this 3D environment, we suppose that the enzyme kinetics in the model follow generalized mass action processes. We do not account for the binding of cytoplasmic enzymes to the membrane prior the initialization of their catalytic activity.

Transport fluxes
Transport fluxes reflect vesicle and non-vesicle-mediated transport [3][4][5] . It is unclear what percentage of the total transport is vesicle-independent. The transport can be accomplished by specialized proteins, called lipid transport proteins (LTPs), or can happen spontaneously at membrane contact sites (MCSs). We assume that these effluxes have first-order kinetics (fi= 1) and have one common rate constant. This last assumption is not trivial. For example, some species of phosphoinositides are necessary for the initiation of vesicle formation so they might have a slightly higher exit rate. At the same time, non-vesicle mediated transport is facilitated by LTPs, and LTPs with varying affinities for different phosphoinositides are presently not known. Overall, we do not possess enough information to include different efflux rate constants for each model variable.
Alberts et al. 6 reported that a macrophage ingests 3% of its plasma membrane per minute, while a fibroblast only ingests 1%. As a compromise, we set the amount of internalized plasma membrane due to endocytosis as 2% per minute. We added 2.5% for non-vesicle transport, resulting in a total of 4.5% per minute. The input flux values were restricted in order to allow 4.5% of the membrane phosphoinositides to recycle per minute. This setting made the levels of PI robust to alterations in other phosphoinositide pools. We also tested a level of 1% per minute. The model exhibited a similar behaviour, but the PI pool fluctuated considerably in response to alterations in other phosphoinositide pools. For example, manipulations in the PI(4)P and PI (4,5)P2 levels caused an undue increase of more than 15% in the PI pool. Of course, it is imaginable that such fluctuations could exist and have a physiological meaning, for instance, to signal to the kinases responsible for PI(4)P and PI (4,5)P2 production that these lipids are being depleted. However, we found no reports in the literature of this phenomenon, and therefore chose the model setting corresponding to a more stable PI pool.

Parameter Estimation
The phosphoinositide literature is vast. However, many experimental results are presented in ways that are difficult, if not impossible, to translate into numerical values of model variables or relative changes. Other publications present results that involve manipulations or perturbations of genes or proteins that are not included in our model. In the end, the experimental data used in the following to parameterize the model consist of all data from the literature that permitted reliable translation into model variables or changes in model parameters.
Each flux vij contains three parameters, namely a rate constant j, one or more kinetic orders fij, and the available quantity of enzyme that catalyses the reaction in this flux (Eij). These three types of parameters need to be estimated to populate the model equations.
If parameter values are found in the literature, they are typically given as Km and specific activity values. Many of such values have been collected in the database BRENDA 7 . While BST does not use these parameters, they are easily converted into values for rate constants and kinetic orders of GMA model 8 . Enzyme kinetic parameters retrieved from literature and BRENDA are shown in Table ST4.
Km and specific activity values collected in BRENDA are in mM and µmol/min/mg respectively. Before using these parameters in the estimation of rate constants and kinetic orders, Km's were converted to molecules/µm 2 and the specific activities to molecules/min/mg. While converting volume units to area units, we considered that the number of molecules present in a 1 µm 2 membrane patch is the same number present in a 1 µm 3 volume including the membrane patch and a thin layer of surrounding cytosol. Table ST4 also presents the known cellular localization of each enzyme as retrieved from UNIPROT annotations and described by Sasaki et al. 9 . Since the model aims to simulate a patch of the plasma membrane, enzymes should be considered if they are present in the cytosol or in the plasma membrane. This is true for all enzymes except the three SAC phosphatases. However, there is evidence that these phosphatases can control directly the levels of PI4P at the plasma membrane by acting at membrane contact sites between the ER and the plasma membrane 10 .
For enzymes that are present in multiple cellular compartments it is difficult to assess their relative contributions to the catalysis of a given reaction in each of the compartments. This lack of information is not critical in our approach since the parameters defining the amount of enzyme catalysing each flux in the plasma membrane patch (Eij) are numerically adjusted to optimize the model fit to the experimental data.
The parameters in Table ST4 could not directly be applied to the model since there is no one-toone relationship between each enzyme and each flux. Some enzymes catalysed more than one flux and many fluxes may be catalysed by more than one enzyme. Because quantitative details regarding these multiple processes are lacking, enzymes catalysing the same reaction were grouped using information in Balla 11 and Sasaki et al. 9 .
For each group, one enzyme was chosen to define the flux parameters. The chosen enzyme was normally the one identified in the literature as the main catalyst of the respective flux. The enzymes and associated fluxes are presented in in Table ST5 for kinases and in Table ST6 for phosphatases.
Generally, more information is available for kinases than for phosphatases. Moreover, there is a difference concerning the assignment of kinases and phosphatases to different fluxes. As Delage et al. 12 observed, kinases are often specific while phosphatases are polyvalent. Of the ten fluxes catalysed by phosphatases, seven can be catalysed by synaptojanins (SYNJ) and five by suppressor of actin (SAC) 9 . This redundancy is in stark contrast with the kinases that catalyze one or two reactions at most. An exception is phosphoinositide 3-kinase (PI3K), which is polyvalent if one allows for primary and secondary reactions. It is also noteworthy that PTEN, probably the most studied phosphatase as a consequence of its role as tumour suppressor, is specific for PI (3,4,5)P3 and PI(3,4)P2 substrates 13 .
Fine-tuning of the model was accomplished by adjusting the levels of enzymes (Eij) and fluxes entering and exiting the system (0, 3, 4, i). These parameters were manually tuned until the model reached a steady state where phosphoinositide levels agreed with values reported for the cell membrane (Table ST1) and model simulations replicated observed phenomena (Table  ST2 and Fig. 3). Most of the remaining parameters, that is, rate constants and kinetic orders derived from the literature were not altered. PIP5KII parameters had to be adjusted to allow a steady-state composition compatible with literature values. This enzyme catalyses the flux v545 11 9 . Using the original parameters for this enzyme would make v545 rather slow, yielding an accumulation of PI(5)P at around 700 molecules per µm 2 , which is outside the intervals reported in the literature. One could slow down the sources of PI(5)P in order to obtain a smaller pool, but that would reduce the turnover of PI(5)P which is supposed to be high 11 , and the model would no longer replicate the observations regarding PI(5)P, PI(3)P and PI (3,5)P2 documented in the literature. Further, using the reported PIP5KII parameterization found in the literature 14 , PI(5)P and v545 would not sustain the PI(4,5)P2 pool. We also had to re-estimate two rate constants, 35 and 355. Values retrieved from BRENDA were too small to replicate observed phenomena. For example, Bulley et al. 15 stated that the majority of the PI(5)P pool is formed from PI(3,5)P2. With the values retrieved from the literature 16 v05 would have a greater contribution to the PI(5)P pool than v355. Increasing 355 forced us to increase 335, which subsequently increased the PI_Kfyve activity when PI(3)P was used as substrate rather than PI. The increase of 355 could be explained by a regulation that is not implemented in the model: the activation of MTMRs by PI(5)P 17 . The amount of PIP5KII in the model is the highest as kinases are concerned. There is evidence in the literature of higher abundance of PIP5KII than PIP5KI 18 .
After the manual adjustments, as described above, a genetic algorithm was used to search for a parameter set that produced the closest fit to a set of observed phenomena.
The implemented genetic algorithm has two mechanisms to create diversity: mutation and recombination. The mutated parameters are obtained multiplying the initial value by a random value from a normal distribution of mean 1 and standard deviation of 0.3 or 0.5. Each new parameter set is only accepted if a set of validity conditions is met (Table ST7). Higher standard deviation values would make the algorithm run very slowly due to a high number of invalid sets. Recombination combines two parameter sets and thereby generates a new set where each parameter is the average of the corresponding parameters in the two parent sets.
Each generation starts with 10 distinct parameters sets. In the first generation, the initial set consists of the manual solution and 9 viable mutants are obtained from this parent set. All pairs of these 10 sets are combined producing 45 recombined sets. Additional 10 parameter sets are created by mutating all enzyme levels, inputs and outputs parameters from 10 of the existing 55 sets picked randomly. Next, 35 new sets are created recombining a random mutant with one of the previous 55 sets. The resulting 100 sets are the source for further 100 minor mutant sets that alter between 1 and 5 parameters of the 100 existing sets. The final 200 parameter sets are then scored and the 10 sets with the best score are selected to start the next generation. The criteria for scoring the sets are presented in Table ST7.
Two versions of the genetic algorithm were employed: one where the 10 initial progenitor sets of each generation could be part of the next generation and one where they were excluded. The former algorithm converged rapidly but the latter found best scores more quickly. In the long run, both algorithms delivered sets with similar scores that were 50% better than the manually determined set. Final parameters for the fluxes and protein amounts per µm 2 are shown in Table ST3.

Sensitivity Analysis
Local sensitivity analysis was implemented as described in Chen et al. 8 and combined sensitivities, also known as global sensitivities, were computed as detailed in Kent et al. 19 .
Parameter sensitivities were assessed numerically by increasing each parameter, one at a time, by 1% and computing the new steady state of the system. When the relative change in the steady-state value of a dependent variable is higher than 1% (or lower than -1%) the sensitivity indicates that a change in the parameter value is amplified in the steady-state of the dependent variable. Smaller sensitivities indicate attenuation of a perturbation. Although there are exceptions, most biological systems are expected to show sensitivity values between about -3 and +3. However, in signalling systems, the sensitivities may be much higher.
The model contains 64 parameters but the sensitivity analysis can be reduced to 46 parameters, because there are 21 pairs of rate constants (ij) and a corresponding enzyme quantity (Eij) that always appear as a product in a flux equation. Multiplying a constant to Eij or to the corresponding ij will give the same relative change in flux and, thus, in steady state. Consequently, the sensitivities for the rate constants are identical to the corresponding enzyme quantity.
The results of the sensitivity analysis are conditional on the parameter set that provides realistic steady-state values and replicate the observed phenomena. Furthermore, they reflect responses to changes in individual parameters. However, multiple small or intermediate errors in parameter values could collectively have a larger impact on the performance of the system. To address this question, combined sensitivities were studied using a Monte-Carlo approach. Specifically, we assigned a random uncertainty of ±5%, ±10%, ±20%, ±50% or ±100% for every parameter. With every combination, we created a new parameter set and recorded those combinations that led to a steady state in up to a simulation time of 1000 minutes. With this method, 5,000 new parameter sets were retrieved for each level of uncertainty. For each parameter set sampled we calculated the local sensitivities.

Sensitivity Analysis Results
The complete list of local sensitivities is presented in Table ST8. Only 20 out of 368 (~5%) computed sensitivities have an absolute value greater than 1. The highest sensitivity value indicates an increase of 5.7% in PI(3,4)P2 when f34534 is increased by 1%. The parameter f03 has the highest number of high sensitivities, and these are related to PI(3)P, PI(5)P and PI(3,5)P2.
The high proportion of low sensitivities shows that the model system is very robust to environmental and mutational challenges that perturb a parameter value. It also implies that small errors in model parameterization will not affect model behavior in a significant manner.
We intended to investigate whether the distributions of sensitivities might suggest a different behavior associated with a parameter change than the one suggested by the local sensitivities. Generally, if the sensitivity of a parameter exhibits large variability, it might suggest a lack of system robustness 19 .
The combined sensitivity results suggest that most parameter sensitivities are concentrated closely around the local sensitivity value. 112 out of 368 sensitivities (30.4%) led to a low overall sensitivity, even if the parameters were allowed to vary up to ±100%. Of the 256 that presented higher combined sensitivities, only 42 exhibited this behaviour with uncertainties lower or equal than 50%.
Considering these results, we can conclude that small changes in parameter values do not alter the sensitivity profile of the dependent variables. These observations suggest that the model is robust with respect to uncertainties in parameters.
The result is consistent with the earlier traditional sensitivity analysis. All parameters that in the combined sensitivity study cause high sensitivities in five or more dependent variables are also presented in the individual sensitivity analysis, except for f450, f045 and f455. This suggests that the model was parameterized so that v045 and v450 fluxes have a small influence, although their potential for high sensitivities is considerable.

Identifiability Analysis
If a model is only slightly perturbed even in response to large changes in a parameter value, does it mean that the parameter value is correct? This question becomes complicated if the sensitivities of two or more parameters are correlated and the increase in one can be compensated with an alteration in one or more others. The issue is related to the existence of infinite parameter combinations that produce essentially the same model output. This redundancy is especially important if we want to suggest an experimental design to populate the model with parameters values. These issues are related with the problem of parameter identifiability, which is defined as the ability to identify the true value of a model parameter 20 .
To find the best identifiable parameters we implemented the method described by Srinath and Gunawan 20 and Yao et al. 21 , which is based on the local sensitivity matrix. For each column of the matrix, the Euclidian norm is calculated and the column with the highest magnitude is selected. If the magnitude exceeds a certain threshold, the parameter corresponding to this column is identifiable. This column is removed from the local sensitivity matrix. The projection of the removed column on the remaining columns is computed and subtracted from them. This procedure creates a new local sensitivity matrix. The process is repeated until the highest magnitude is below the threshold. All remaining parameters are considered non-identifiable.

Results of the Identifiability Analysis
In the phosphoinositide pathway model, the best-identifiable parameters are f40, f335, f545, f45345, f355, f434, f454 and finally i (Fig. SF1d). These parameters constitute a subset of those parameters that exhibited high local sensitivities. The parameters that were considered non-identifiable either have very small sensitivities, or their effect on the dependent variables may be replicated by a linear combination of perturbations in identifiable parameters.

Monte-Carlo Exploration of the Parameter Space
The identifiability analysis suggests the existence of alternative parameter sets that replicate both, the reported steady-state levels of phosphoinositides and the observed relationships between steady-state levels. To identify possible alternative parameter sets satisfying these conditions, we explored 79,993 random parameter sets using a Monte Carlo approach. Only the most identifiable parameters (discussed in the previous section) plus the input and output fluxes were allowed to vary between 50 and 150 per cent from their reference value. These parameter sets were only accepted if they reached a steady state in up to 5000 minutes of simulation time.
Non-identifiable parameters were not varied in order to optimize the parameter space exploration, because varying non-identifiable parameters would yield small changes in system behavior. Additionally, linear combinations of perturbations in identifiable parameters can mimic most of the effects of perturbations in non-identifiable perturbations. Input fluxes were also varied although they were not identifiable. This variation was necessary to accommodate changes in effluxes (i) and still allow for the system to achieve a steady state.

Results of the Monte-Carlo Exploration of the Parameter Space
Among the initial 79,993 combinations of parameter values, 79,985 reach a steady-state within 5000 minutes and 1452 have phosphoinositide levels within the intervals retrieved from the literature. Among these, 231 replicate the relative amounts between the phosphoinositide pools. Intriguingly, only 117 (the set used to parameterize the model and 116 alternatives), less than 0.15% of all surveyed sets, satisfy the previous conditions plus the fact that effluxes are less in magnitude than 7% and the influxes are less than 25% of the respective phosphoinositide pools. Indeed, the 116 alternative sets of parameters have outputs that are similar to the ones produced by the optimized parameter set but are less concordant with of the phenomena retrieved in the literature. For example, as can be seen in Fig. 5, taking the score of the manually found parameter set as a base score of 20, all of the 116 admissible alternative parameter sets have a higher score, i.e., present less concordance with the conditions of the scoring function.
The results of this analysis suggest that the information available is sufficient to restrict the parameter space to a region that is compatible with all experimental observations. The alternative parameter sets are characterized by alterations that cancel each other in the system's input of PI, PI(3)P and PI(4)P and output fluxes.
The conditions imposed to find alternative parameter sets are not very restrictive. For example, we accepted data sets with the levels of PI between 200,000 and 400,000 and for PI(4)P and PI(4,5)P2 between 5,000 and 20,000. As such, the small number of parameter sets that satisfy all the tests was not due to narrow test conditions but a genuine scarcity of combinations of parameters that satisfy all pertinent phenomena reported in the literature.
It is impossible to discern which set is the best representation of reality, not only because there are gaps in information, like missing measurements of the input fluxes, but also because these parameter sets can correspond to different cell types, states or membrane configurations. However, the initial parameter set used in the model has kinetic parameters in accordance with the literature and steady-state values that are closest to the center of the reported intervals for phosphoinositide levels.

Preparation of siRNA coated multi-well plates
Multi-well plates (384-well plates) (BD Falcon #353962) were coated with customized siRNAs (Silencer® Select, Ambion) for solid-phase reverse transfection adapted from a previously reported protocol (Erfle H et al., 2007). An aqueous 0.2% (w/v) gelatine solution was prepared and filtered with 0.45µM pore size filter and a 0.4M glucose solution was prepared in Opti-MEM (Gibco #51985). Then, a transfection mix was prepared by mixing 1.662mL of the sucrose/Opti-MEM solution, 969µL of Lipofectamine® 2000 (Gibco #12566014) and 969µL doubly distilled water. This transfection mix was distributed into a 96-conic well plate (35µL/well, "Plate A"). In parallel, fibronectin was diluted in the 0.2% gelatine solution to a concentration of 1%. This solution was distributed into another 96-conic well plate (96µL/well, "Plate B"). Then, 5µL of a 3µM siRNA solution and 7µL of the transfection mix ("Plate A") were incubated in each well of a low volume 384 well plate ("Plate C"). After 20-min incubation, 7µL of the fibronectin solution ("Plate B") were added. 3µL of the contents of each well in "Plate C" were diluted fifty fold in a 384 deep well plate using doubly distilled water. Finally, 15µL of each well were transferred to a 384-well imaging plate, lyophilized and stored in an anhydrous atmosphere before cell seeding.

Image Acquisition
The first row of the cells in the 384-well plate was imaged in the wide-field Olympus Scan^R microscope with a 10x objective (Olympus, UPSAPO) in the Cy3 channel and DAPI channel with an exposure time of 5ms and coarse auto-focus (two images per well, 5min/row). Then FMP containing 30µM amiloride hydrochloride (Sigma #A7410) was added by an automatic liquid dispenser adapted to the microscope stage. The image acquisition was initiated after 5min of incubation with amiloride.

Image Analysis
Overall transfection efficiency was assessed by observing if cells transfected with siRNAs compromising chromosome segregation exhibited mitotic phenotypes (Simpson et al., 2012). Failure to observe these phenotypes in more than 75% of images implied the rejection of the corresponding plate from analysis.

 
Each image was corrected for the background and several quality controls were applied: cells with too high or too low intensity; abnormal area shape, eccentricity; low number of cells per well; cells out-of-focus, localized near the edges or that changed their position in the well were not quantified. The amiloride-sensitive ratios were further analysed in R in order to automatically identify the potential "hits", which are the ones whose ratio deviate more than two standard deviations from the ratio of the negative control, as calculated from the following formula:

  
Where SDM is a standard deviation of the mean and "scrambled" siRNA was used as a negative control.
We considered as significant Amil-sensitive ENaC functional effects, those whose magnitudes were larger than twice the negative control´s SDM. Therefore, we defined ENaC function enhancers as those conditions having a Deviation Score above +1 and ENaC function inhibitors as those having a Deviation Score below -1. Additionally, Student´s t-test was performed to quantify statistical significance versus the corresponding negative control.

Supplementary Figure 1 (SF1) -Evolution of sensitivities due to an increasing level of uncertainty and parameter identifiability. a) Evolution of the number of extreme sensitivities
with absolute values greater than 1 as a function of the level of uncertainty in parameter values. MAX corresponds to the total number of extremes in the model (736 extremes, which is twice the number of parameters). A noticeable increase begins around the 50% uncertainty level. b) Evolution of the standard deviations of sensitivities due to increasing uncertainty. The boxplot corresponding to 100% uncertainty does not show all outliers, the highest of which is 48.09. This finding suggests that the variance in sensitivity is moderate at least up to 50% uncertainty. c) Evolution of the percentage of positive sensitivities according to parameter value uncertainty.
Each line corresponds to one pair of a parameter and a dependent variable. For low levels of uncertainty, the majority of the observed sensitivities are almost all positive or all negative, which means that the sign of the sensitivities are constant for small perturbations. The reason for this consistency is that the background parameter sets lead to models with similar behaviour. The result also suggests that the model behaviour is consistent even under moderate uncertainty and only starts to vary for 50% or more uncertainty in parameter values. d) Plot showing the best-identifiable parameters. i represents for the rate constant for all exit fluxes. Graphs created in R 22 .