Abstract
The optimization problem aiming at the identification of minimal sets of nodes able to drive the dynamics of Boolean networks toward desired longterm behaviors is central for some applications, as for example the detection of key therapeutic targets to control pathways in models of biological signaling and regulatory networks. Here, we develop a method to solve such an optimization problem taking inspiration from the wellstudied problem of influence maximization for spreading processes in social networks. We validate the method on small gene regulatory networks whose dynamical landscapes are known by means of bruteforce analysis. We then systematically study a large collection of gene regulatory networks. We find that for about 65% of the analyzed networks, the minimal driver sets contain less than 20% of their nodes.
Introduction
Determining the influence of nodes in networks is critical for understanding and controlling realworld systems^{1}. Applications include identifying super spreaders in marketing and political campaigns^{2}, immunization targets for disease containment^{3}, vulnerable nodes in financial networks^{4}, and key therapeutic targets in biological signaling and regulatory networks^{5,6,7}.
A large body of research within the network science literature focuses on the problem of influence maximization, i.e., identifying the nodes that maximize a spreading process on a network^{8}. In the standard problem setting, all nodes in the network are initially set to an inactive state. A fixed number of seed nodes are activated to initiate a spreading process in the network. The influence of the seed set is measured in terms of the size of the outbreak, i.e., the total number of nodes activated during the spreading process. The size of the outbreak depends on the rules of the spreading process, the structure of the underlying network, and the nodes in the seed set. The problem of influence maximization is thus the identification of the seed set, among all possible with prescribed size, that generates the largest average outbreak in the network. This problem is known to be NPhard for both simple and complex contagion processes, and thus exactly solvable only in extremely small networks^{9}. Approximate but effective strategies to solve the influence maximization problem have computational complexity that ranges from cubic to linear^{10}.
Here, we consider the generalization of the influence maximization problem to Boolean networks, a type of discrete dynamical systems^{11}. In a Boolean network, we are not necessarily interested in maximizing the number of nodes that are activated during the dynamics; rather, our goal is identifying sets of driver nodes that are sufficient to control the dynamical system toward some desired final configuration^{12,13,14}. Note that some of the spreading models considered in the standard formulation of the influence maximization problem, e.g., the linear threshold model^{15}, can be seen as a special formulation of the more general family of problems that we consider here. The problem of identification of a driver set in Boolean networks is known to be polynomial for treelike structures, but NPhard for general networks^{14}. One way in which we simplify the problem is by focusing on identifying the smallest driver set for a specific attractor^{16,17}. Our minimization problem is analogous to the one generally considered in control theory for networks^{18}, but with the nontrivial goal of accounting for the nonlinearity characterizing Boolean dynamics.
We remark that Boolean networks can be mapped to discretetime linear dynamical systems^{19}. However, the exact optimization problem is computationally intractable due to the fact that the size of the resulting system is 2^{N}, with N equal to the number of nodes in the original Boolean network. One way to obtain approximate solutions in reasonable time relies on utilizing only the structure of the underlying Boolean network, by linearizing the nonlinear dynamical rules that regulate the evolution of the system^{20}. Whereas linear structurebased control methods may be effective in some types of Boolean dynamics, they do not provide any guarantee of finding the best set of nodes to control a network^{17}.
Another way to approximate sufficient driver node sets is by identifying the nodes that break feedback vertex sets^{21,22}. This method does not linearize dynamics and finds a driver set that can control the network to any attractor; however, it can only provide the driver set that controls the ensemble of all dynamical systems that fit the same network structure and has nonpolynomial complexity^{21,23} which hinders applicability to large networks. Other methods exist to find optimal driver sets toward specific attractors for a given Boolean network. For example, Zañudo and Albert successively find partial fixed points of the network dynamics (called stable motifs) that guide the dynamics toward an attractor of interest^{24}; Kim et al. use genetic algorithms and a network’s attractor landscape to find the minimal driver set (called the control kernel) to guide the dynamics to a specified attractor^{25}; Borriello and Daniels similarly find control kernels by pinning nodes to their specified state in the desired attractor that distinguish that particular attractor from other attractors^{26}. Unfortunately, all of these methods have exponential complexity, in the general case.
Toward a feasible approximate method for identifying the nodes that control dynamics to a desired attractor, we first deploy an individualbased meanfield approximation (IBMFA) for Boolean network dynamics. As in the IBMFA used in the study of spreading processes on networks^{27}, our IBMFA also consists in neglecting dynamical correlations among variables so that every node integrates the average, over an infinite number of independent realizations of the dynamical process, behavior of its neighbors. Meanfield approaches to Boolean network dynamics exist, including the classic approach by Derrida and Pomeau^{28} and other recent attempts, e.g., refs. ^{29,30,31,32}. Those attempts are developed for the socalled annealed networks, thus they are devised to deal with ensembles of networks, where the network structure and/or the rules of the Boolean dynamics are stochastic. The resulting approximations allow to describe the behavior of the dynamical system averaged over the given ensemble of networks. We differentiate from these previous attempts by developing an approach that is valid for quenched networks. Our approach takes as input a given network structure and prescribed Boolean rules, and generates as output the average trajectory of the dynamical system started from stochastic initial configurations. We show that our IBMFA accurately reproduces the average dynamical behavior estimated from numerical simulations of individual node states in both random Boolean networks (RBNs) and gene regulatory networks (GRNs).
Second, we introduce a statistical notion of control or influence for Boolean dynamics. Accordingly, the influence of a set of nodes is quantified in terms of the entropy associated to longterm configurations reachable by the system when controlled by that specific set of nodes but otherwise started from a maximally uncertain initial condition. We construct optimal sets of influential nodes by means of greedy optimization^{33}. The algorithm scales cubically with the network size, thus allowing the analysis of systems that cannot be studied with bruteforce approaches. The algorithm is used to approximate the minimumsize driver set required to reach a known attractor by simply constraining the search over configurations compatible with the attractor. Also, the algorithm is used in unconstrained searches for optimal sets of nodes able to drive the system toward an attractor with a large basin. We validate our method on known attractors of the Drosophila melanogaster segment polarity singlecell and parasegment networks^{34}. Also, we recover known effects of anticancer drugs on the estrogen receptor breast cancer network ^{35}, and find minimal driver sets in the networks representing the yeast Saccharomyces cerevisiae cellcycle^{36} and the T cell large granular lymphocyte leukemia^{37}. We then systematically apply our method to large collections of synthetic and real networks. We find that the relative size of the optimal driver set in RBNs toward unspecified attractors increases as the degree of the network increases, but is invariant with the system size. GRNs within the Cell Collective repository^{38} are also characterized by optimal driver sets whose relative size is independent of the system size. Our predictions for networks within the Cell Collective repository^{38} are in very good agreement with those obtained by the method of Borriello and Daniels^{26}.
Results
Accuracy of the individualbased meanfield approximation
We consider arbitrary Boolean networks^{39}, see the Methods Section “Boolean networks” for details. The state of node i at time t is denoted by the binaryvalued variable σ_{i}(t) = 0, 1, while a generic configuration of the system is denoted by \(\overrightarrow{\sigma }(t)=[{\sigma }_{1}(t),\ldots ,{\sigma }_{N}(t)]\), with N standing for the network size. The dynamics of node i is specified by the lookup table F_{i}, whose inputs are the state of all its k_{i} neighbors, see Eq. (1). Under synchronous updating, given the configuration \(\overrightarrow{\sigma }(t)\), the system evolves in a deterministic fashion to another configuration \(\overrightarrow{\sigma }(t+1)\). A full description of the system’s dynamics can be given in terms of its associated statetransition graph (STG), where each node corresponds to one of the 2^{N} possible configurations and a single directed edge indicates the transition from one configuration to another. Understanding the dynamical properties of a Boolean network from its STG is straightforward. However, the very fact that the size of the STG grows exponentially with the system size limits its practical relevance to very small systems only.
To overcome this limitation, we deploy an individualbased meanfield approximation (IBMFA) aimed at describing the average behavior of nodes in the Boolean network, see Section “Individualbased meanfield approximation” for details. In the IBMFA, the dynamical state of node i is represented by the realvalued variable s_{i}(t), standing for the probability of finding the node active at stage t of the dynamics, i.e., s_{i}(t) = P(σ_{i}(t) = 1). In this model of the dynamics, each node is influenced only by the average behavior of its neighbors, i.e., dynamical correlations among variables are neglected, see Eq. (4). IBMFA requires to sum over all the entries of the lookup tables. Thus, unlike STGbased methods but similarly to causal graph methods^{16,40}, the approximation grows linearly with N and exponentially with the degree of the nodes. As a result, IBMFA is feasible to compute in large networks as long as the degree of the nodes is not too large, making it applicable to many realworld sparse networks.
The average trajectory quantified by IBMFA is taken over realizations of the dynamical system and is conditioned by the distribution of the initial configuration \(P(\overrightarrow{\sigma }(t=0) \overrightarrow{s}(t=0))\), see Eq. (3). As a result, if the initial condition is certain, i.e., s_{i}(0) = 0, 1 for all i, then IBMFA reproduces the groundtruth trajectory on the STG started from the given initial configuration. Also, no average is taken on either the network structure or the Boolean lookup tables. This is the main point of differentiation between our IBMFA and existing meanfield approaches for Boolean networks^{28,29,30,31,32}.
In order to test the accuracy of our approximation, we compare IBMFA predictions with results from numerical simulations. Some results of our tests are shown in Fig. 1.
First, we test the accuracy of the IBMFA in random Boolean networks (RBNs), see Fig. 1a. We generate RBNs with N = 100. Each node has degree equal to k. The output of each of the 2^{k} rows of the lookup table F_{i} of node i is set equal to either 0 or 1 with identical probability. The results of Fig. 1a correspond to averages taken over 100 independent RBNs for each k value. Initial configurations of the dynamics are randomly generated according to the probability of Eq. (3), where we set s_{i}(t = 0) = 1/2 for all nodes i. We sample R = 100 random initial configurations per network instance. We measure the mean squared error of the IBMFA prediction with respect to the groundtruth average trajectory estimated from numerical simulations, see Eq. (5). For k > 1, the error begins to plateau to a nonnull value at t ≃ 10; furthermore, the error decreases by increasing the degree from k = 2 to k = 3. For k = 1, the error quickly goes to zero. This is due to the peculiar ring structure of the graph. Because our goal is to determine influence to longterm configurations independent of the starting conditions, it is important that the IBMFA accurately reproduces the behavior of the system averaged over all possible configurations. Empirically, an observer may only measure one instance of a network’s dynamics which may not be very indicative of the average dynamical behavior. The comparison with the variance of the sample of configurations used to estimate the groundtruth average trajectory, see Eq. (6), tells us that IBMFA is more informative than observing the outcome of a single instance of the network dynamics if the goal is to predict the average behavior of the system.
Also, we test IBMFA on gene regulatory networks (GRNs) and other biological signaling networks including the singlecell Drosophila melanogaster segment polarity network (SPN), see Fig. 1b. The network is composed of N = 17 nodes only, thus still approachable by means of the bruteforce STG analysis. Initial configurations of the dynamics are randomly generated according to the probability of Eq. (3), where we set s_{i}(t = 0) = 1/2 for all nodes i. Results are obtained over R = 100 independent initial configurations of the dynamics. Also here, we see that the error reaches a nonzero plateau value for t ≃ 10. The plateau value of the baseline error is much larger than the one observed for IBMFA. Similar findings are obtained also for the networks representing the T cell large granular lymphocyte (TLGL) leukemia (N = 60) and the estrogen receptor (ER+) breast cancer (N = 80).
In the Supplementary Information (SI), we repeat the analysis for different updating schemes, including deterministic asynchronous, stochastic asynchronous, and block deterministic updating schemes^{41,42,43}. We find that the evolution of the mean squared error associated with the IBMFA is influenced by the specific updating rule considered, see Fig. S1. For the Drosophila melanogaster SPN we find that the longterm value of the IBMFA error is almost insensitive to the updating scheme used. Instead, for the yeast cellcycle network, we observe more apparent differences between IBMFA errors depending on the updating scheme at hand.
Dynamical influence of nodes
We study how some externally controlled nodes affect the dynamical behavior of a Boolean network. Based on the analogy with problems considered in the context of spreading processes on social networks^{8,9}, we use the term seed for a node that is externally controlled, and the term influence to indicate the effect of one or more seeds on the dynamics of the network. When the former have maximum influence, they are also known as driver nodes in the context of controllability.
We denote a generic set of seed nodes as \({{{{{{{\mathcal{X}}}}}}}}\), see Section “Definition of seed set”. Strictly speaking, \({{{{{{{\mathcal{X}}}}}}}}\) is a set of tuples of the type \((i,{\hat{\sigma }}_{i})\), each specifying the label of the nodes belonging to the set as well as their imposed state value. We say that node i belongs to \({{{{{{{\mathcal{X}}}}}}}}\), i.e., \(i\in {{{{{{{\mathcal{X}}}}}}}}\), if the label of node i appears in one of the tuples of the set. Please note that node i may appear at maximum in one tuple of \({{{{{{{\mathcal{X}}}}}}}}\), as either \((i,{\hat{\sigma }}_{i}=0)\) or \((i,{\hat{\sigma }}_{i}=1)\). Nodes in the set \({{{{{{{\mathcal{X}}}}}}}}\) can influence the dynamics of other nodes that do not belong to \({{{{{{{\mathcal{X}}}}}}}}\); we assume, however, that seeds do not change their state during the dynamics of the system, i.e., \({\sigma }_{i}(t)={\hat{\sigma }}_{i}\) for all \(i\in {{{{{{{\mathcal{X}}}}}}}}\) and for all t ≥ 0. This is known in the literature as pinning control^{17,21,23}. The assumption is identical to the one underlying the study of influence in irreversible spreading processes on social networks^{9}. Here, the invariance of the dynamical state of the nodes in \({{{{{{{\mathcal{X}}}}}}}}\) serves to model systems whose typical time scale is much shorter than the one used to perturb the state of the seed nodes. This is a good assumption for some applications, as for example the study of the effect of drugs in GRN dynamics.
We assess the influence of the set of seeds \({{{{{{{\mathcal{X}}}}}}}}\) in a Boolean network in terms of the residual uncertainty about the states of other nodes that do not belong to \({{{{{{{\mathcal{X}}}}}}}}\). It is measured assuming that the state of the nodes in \({{{{{{{\mathcal{X}}}}}}}}\) is known and does not change during the dynamics. The notion is similar to the one used in ref. ^{44}. Specifically, we assume that the initial configuration is randomly sampled from the distribution of Eq. (1). In doing so, the state of the nodes in \({{{{{{{\mathcal{X}}}}}}}}\) is set deterministically, i.e., \({s}_{i}(t=0)={\hat{\sigma }}_{i}\) for all \(i\in {{{{{{{\mathcal{X}}}}}}}}\). By contrast, we have maximal uncertainty for all other nodes, i.e., s_{i}(t = 0) = 1/2 for \(i\,\notin\, {{{{{{{\mathcal{X}}}}}}}}\). We measure the residual uncertainty of the system at time t as the entropy of the probability distribution of the configurations reachable by the system at time t, see Eq. (7), conditioned to the known state of the nodes in \({{{{{{{\mathcal{X}}}}}}}}\). The dynamical influence of the set \({{{{{{{\mathcal{X}}}}}}}}\) is inversely proportional to the longterm residual uncertainty of the system.
To speed up the computation of the entropy, we rely on the IBMFA. The approximation provides us with a state probability value s_{i}(t) for each node i in the network, and such a value can be readily plugged into Eq. (7) for the computation of the entropy. The use of IBMFA is justified by a good level of agreement, both at the level of individual nodes and configurations, with the groundtruth entropy estimates from numerical simulations (see Figs. S2 and S3 for details).
We monitor the entropy of various Boolean networks conditioned by different seed sets \({{{{{{{\mathcal{X}}}}}}}}\). We find that different sets of nodes can have very different dynamical influence on the network. The effect strongly depends not just on what nodes are in the set but also on the state imposed on these nodes.
In Fig. 2a for example, we show how the entropy of the Drosophila melanogaster SPN evolves in time. We consider different choices for the set \({{{{{{{\mathcal{X}}}}}}}}\). Clearly, maximal initial uncertainty is present for \({{{{{{{\mathcal{X}}}}}}}}={{\emptyset}}\). Such an uncertainty typically decreases as the system dynamics evolves. We note that some nodes have more influence than others in reducing the residual entropy of the system. For example, the set \({{{{{{{\mathcal{X}}}}}}}}=\{({{{{{{{\rm{en}}}}}}}},{\hat{\sigma }}_{{{{{{{{\rm{en}}}}}}}}}=1)\}\) has more dynamical influence on the network than the set \({{{{{{{\mathcal{X}}}}}}}}=\{({{{{{{{\rm{CIR}}}}}}}},{\hat{\sigma }}_{{{{{{{{\rm{CIR}}}}}}}}}=1)\}\). Also, it is important to stress that dynamical influence is due not just to the identity of the nodes, but also to their imposed state. For example, \({{{{{{{\mathcal{X}}}}}}}}=\{({{{{{{{\rm{CIR}}}}}}}},{\hat{\sigma }}_{{{{{{{{\rm{CIR}}}}}}}}}=1)\}\) has dynamical influence larger than \({{{{{{{\mathcal{X}}}}}}}}=\{({{{{{{{\rm{CIR}}}}}}}},{\hat{\sigma }}_{{{{{{{{\rm{CIR}}}}}}}}}=0)\}\). We note that in all cases, entropy values plateau after t ≃ 10 stages of the dynamics. We treat this value as representative for the longterm behavior of the system, and use it in our operative definition of longterm dynamical influence of a seed set. We find that entropy values similarly plateau after t ≃ 10 in other biological networks and, importantly, that there is little or no change in rank order of seed sets after this value. We, therefore, use this value in our analysis of all biological networks unless stated otherwise.
The above considerations are confirmed if network dynamics evolve according to different updating schemes, see Figs. S4–S5. System entropy follows a trajectory that depends on the specific updating scheme considered, however, its longterm value is almost the same irrespective of the updating scheme at hand for the Drosophila melanogaster SPN. Some differences appear for the yeast cellcycle network.
We further study systematically the influence of all possible \({2}^{ {{{{{{{\mathcal{X}}}}}}}} }\,\left(\genfrac{}{}{0ex}{}{N}{ {{{{{{{\mathcal{X}}}}}}}} }\right)\) seed sets of size \( {{{{{{{\mathcal{X}}}}}}}} \le 3\) for the Drosophila melanogaster SPN and the TLGL leukemia network (see Figs. S6 and S7). For the Drosophila melanogaster SPN, we find that three sets of size \( {{{{{{{\mathcal{X}}}}}}}} =3\) are able to reduce the longterm residual entropy to zero; none of the sets of size \( {{{{{{{\mathcal{X}}}}}}}} \, < \, 3\) results in a null residual entropy. These results are consistent with the known control portrait of this network^{17}. As a comparison, for the larger TLGL leukemia network, no single seed set of size \( {{{{{{{\mathcal{X}}}}}}}} \le 3\) leads to residual entropy equal to zero.
For the ER+ breast cancer network, consistent with the original goals of the model^{35}, we focus our analysis on the dynamical influence of the input nodes representing anticancer drugs (Fig. 2b). Previous literature has shown that some drugs affect cell apoptosis or proliferation more than others^{35,40}. With our approach, we reproduce those results. Specifically, we consider only biologically relevant configurations of the ER+ breast cancer cell baseline where 12 nodes representing biochemical components involved in signaling pathways are set to a particular state and the drug Alpelisib, i.e., an PI3K inhibitor, is active^{35}. We then study the effect that the activation of an additional drug, as a model of multidrug therapy on PIK3CAmutant breast cancer cells, has on the longterm behavior of the dynamical system in order to determine which drugs best synergize with Alpelisib. As already shown by ref. ^{40}, we find that only two drugs have significant impact on system dynamics in this context: Palbociclib and Fulvestrant. With our method, the impact of the two drugs appears as a reduction in the value of the longterm entropy of the network; all other drugs do not reduce entropy beyond the baseline, as shown in Fig 2b. This is consistent with prior causal analysis of this network, which has shown that only these two drugs affect additional signaling pathways not controlled by the baseline cell configuration, thereby having a synergetic effect with Alpelisib on apoptosis and proliferation of cancer cells in this model^{40}. Thus our new approximate model reproduces the causal behavior of the network.
Influence maximization
The results of the above sections demonstrate that the IBMFA is effective in approximating the entropy of Boolean networks. Also, the entropy of the system conditioned by a seed set \({{{{{{{\mathcal{X}}}}}}}}\) is a meaningful quantity to assess the influence of the set \({{{{{{{\mathcal{X}}}}}}}}\) on the longterm dynamical behavior of the network. We leverage these results to develop an efficient algorithm (see Section “Influence maximization”) for the identification of optimal sets of influential nodes in Boolean networks.
The algorithm constructs quasioptimal sets of seeds with a greedy strategy, see Eq. (8). At each stage of the algorithm, the node whose control leads to the largest drop in the entropy function is added to the set. The algorithm has a known performance bound^{33}, and typically provides the best solution to several discrete optimization problems, as for example influence maximization in social networks^{9,10}.
A driver set is identified when the entropy reaches a null value, reflecting the fact that the longterm configuration of the system is fully determined by imposing the specified state of the nodes in the driver set. We stress that the set obtained at the end of the algorithm is not necessarily the optimal one. We, therefore, refine the identified driver set with a postprocessing technique consisting in removing from the set all nodes that do not lead to an increase of the entropy of the system. Similar postprocessing techniques are used to refine solutions to other discrete optimization problems^{45,46}.
Greedy selection can be used to find the minimal driver set to reach a given attractor or, if no constraints are specified, the minimal set to reach an attractor in the network, see Section “Influence maximization”. This attractor is in a sense the easiest one to find via greedy selection and likely corresponds to the attractor with the largest basin.
We apply the algorithm to the Drosophila melanogaster SPN, see Fig. 3. The network is known to have 10 different attractors. By constraining greedy selection to target each of these attractors, we are able to find their corresponding optimal driver sets (Fig. 3b). Please note that postprocessing the driver sets is required to reduce their size significantly without compromising the quality of the solution obtained (Fig. 3c). The minimal driver sets identified by our algorithm well approximate the groundtruth driver sets of the network obtainable by a bruteforce exploration of the STG. Our predictions recover exactly the minimal driver sets for 7 of the 10 fixed points. We overstimate the size of the driver sets required to reach attractors 5, 9, and 10 by one node only (the PTC node). We note that overestimating the size of the groundtruth driver set by a small margin is reasonable given the level of approximation used by our approach, i.e., the IBMFA neglects dynamical correlations and the greedy optimization strategy is suboptimal. We note that by definition the constrained versions of the algorithm display higher levels of uncertainty compared to the unconstrained version at time t = 1; however, due to the suboptimality of our greedy algorithm, a smaller constrained driver set is found (attractor 1) than the one chosen by our unconstrained algorithm (attractor 4) prior to postprocessing. Appropriately selecting seeds dramatically reduces the uncertainty of the system. The control by the best seed leads to a 60% reduction of the system entropy. By contrast, selecting random seeds has a mild effect on the longterm behavior of the dynamics, and entropy does not vanish even if 10 nodes are controlled.
Similar findings are valid for other Boolean network models of biochemical regulation and signaling, see Fig. S8. For the yeast cellcycle network, we identify all optimal sets of seeds that control the system toward its 11 attractors^{36}. We find that the size of the optimal seed sets is at least 4, consistent with what is known about the system^{17}. When compared with the ground truth (see Fig. S9 and Table S1), we see that our method correctly retrieves the minimal driver sets for 4 out the 11 attractors of the yeast cellcycle network, overestimates the driver sets by one or two nodes for 5 fixed points, and it underestimates the driver set by one node for 2 attractors. We recognize that underestimating the size of the driver sets is not a desirable outcome. On the other hand, we note that our underestimations are still good approximations of the ground truth, in the sense that the solutions found by our approach lead to the right attractors in 93% and 97% of the initial configurations, respectively. In other words, the additional node that is indeed required to always reach the desired fixed points is used only in respectively 7% and 3% of the total possible initial configurations. Also, we find that 6 biologically meaningful attractors of the Drosophila melanogaster parasegment network ^{34} can be reached by controlling no more than 11 nodes of the network, corresponding to 18% of the nodes in the network. For example, the wildtype attractor can be reached by pinning only 10 nodes, which is a lower estimate than found by previous methods^{16,23}. In particular, the entropy of the system displays a eightfold reduction after the top 4 seeds are selected by the greedy algorithm for each of the 6 attractors. Additionally, we find that the TLGL leukemia network^{37} quickly reduces in entropy after the selection of the top 3 nodes via the greedy algorithm, and the network can be controlled with only 9 nodes (15% of the network size).
We apply systematically the greedy optimization algorithm to RBNs. Results are shown in Fig. 4a. No constraints are imposed in the search for the top influential nodes. We find that the size of the optimal driver set relative to the system size is a constant that depends on the degree of the graph only. We note that unbiased RBNs with homogenous degree k ≥ 2 are in the chaotic regime and thus among the most difficult types of Boolean networks to control. In this respect, the results of Fig. 4a provide us with an upper bound of the size of the minimal driver set required to control homogeneous networks with given size and average degree. Introducing a bias in the output of the Boolean functions makes the network more controllable than an unbiased RBN, thus leading to a reduction in the size of the minimal driver set (see Fig. S10). This is due to the fact that biased networks are easy to control toward their biased attractors. However, controlling them towards other attractors may be more difficult.
Also, we systematically apply our unconstrained optimization algorithm to the 74 networks of the Cell Collective repository^{38}. We measure node influence after t = 10 iterations of the IBMFA on some networks and verified that the value well represents the entropy values of the longterm dynamics of the system (see Fig. S11). The results of Fig. 4b indicate that there is no apparent correlation (Pearson’s R = −0.22, p = 0.06) between the relative size of the optimal driver set and the size of the network. We verified also that the relative size of the driver set does not correlate well with other topological and dynamical features of the networks, such as average degree, average bias, and mean effective connectivity^{16} (see Figs. S12 and S13). Based on the analysis of the entire Cell Collective corpus, we compute the probability of a node to belong to the minimal driver set conditioned on its in/outdegree, see Fig. S14. As expected, we find that nodes with no inputs, i.e., indegree equal to zero, are always part of the set of drivers as these nodes cannot be controlled if not via external input. For nonnull indegree values, we find that nodes with sufficiently large in/outdegree are significantly more likely to be in the minimal driver set than nodes with small in/ outdegree centrality. This fact indicates that topologically central nodes are likely to be part of the minimal driver set.
Overall, we find that optimal sets of drivers contain less than 30% of the nodes for more than 80% of the networks in the repository (Fig. 4c). We compare our predictions with those by Borriello and Daniels of the average size of the minimal driver sets to fixed point attractors of a network^{26}. Due to the high computational complexity of having prior knowledge of the fixed points of the network, comparisons are possible only on networks of relatively small size. As the results of Fig. 4d show, we find excellent agreement between the average size of driver sets predicted by our method and the predictions by Borriello and Daniels. As additional validation, we verify that the minimumsize driver set obtained with our method via unconstrained optimization is consistently smaller than the average value of the minimal driver sets.
In the SI, we apply our unconstrained optimization algorithm in the search of the minimal driver set under the assumption that the system’s dynamics is regulated by updating rules other than synchronous deterministic updates, see Fig. S15. For the Drosophila melonogaster SPN, outcomes of the analysis are almost identical to those valid for synchronous updating. However, results obtained for the yeast Saccharomyces cerevisiae cellcycle network indicate that the longterm behavior of the network is quite sensitive to the specific updating scheme at hand. This latter observation is in line with the findings of ref. ^{41} regarding the change in the size of the basin of attraction of the fixed points depending on the updating scheme. We also apply constrained optimization toward given fixed points under the various updating schemes, see Figs. S16–S17. We find that the same nodes that drive the network to specific attractors under synchronous update are able to do so under asynchronous update as well.
Discussion
In this paper, we generalize approaches typically considered in the study of spreading processes to Boolean dynamics.
First, we develop an individualbased meanfield approximation (IBMFA) for Boolean network dynamics. The approximation neglects dynamical correlations between Boolean variables, but fully accounts for the topology and the dynamical rules of the network at hand. On sparse networks, the approximation allows to compute average trajectories in a time that grows linearly with the system size.
Second, we leverage the IBMFA to measure dynamical influence of nodes in Boolean networks. We measure influence of a set of seed nodes in terms of the entropy associated with longterm configurations that result from perturbing that set of seed nodes. Perturbations consist in pinning the Boolean state of the seed nodes to a given value during the dynamics of the system. All other nodes have an initial state that is maximally uncertain. High entropy values indicate that several configurations are possible; low entropy values indicate that only a few configurations are reachable; null entropy indicates that the seed set drives the dynamics toward one configuration only. We validate the use of this metric of influence on the Drosophila melanogaster segment polarity network (SPN). Further, we reproduce known anticancer effects of various drugs in combination with Alpelisib on the estrogen receptor breast cancer network.
Third, we deploy a greedy selection process to find minimal driver sets to reach specific attractors in Boolean networks. We validate the method by retrieving known attractors of the Drosophila melanogaster SPN and the yeast cellcycle network, as verified by bruteforce computation. We then use the method to find minimal driver sets to control a network toward an unconstrained attractor in random Boolean networks (RBNs) and biochemical regulation and signaling networks from the Cell Collective repository. Although there are no guarantees of finding the largest attractor basin, the attractor found by our unconstrained greedy selection process is likely to have a large basin of attraction as it requires a minimal number of nodes to be perturbed and thus can be considered the easiest attractor to reach by random initial conditions and perturbations. Interestingly, we see no relation between the relative driver set size and the system size, indicating that control to an unspecified attractor (i.e., maximum influence) depends on the specific nature of the network dynamics. In the Cell Collective repository, we find that 65% of the networks can be controlled by less than 20% of their nodes, and that 80% can be controlled by less than 30% of their nodes. This is similar to previous estimates of driver set sizes in the more general problem of full attractor controllability^{23}. The implications of our results are not as immediate and require further analysis. Via unconstrained optimization, we likely find driver sets toward attractors with a large basin of attraction. There are networks where attractors with a large basin are regarded as biologically meaningful^{25,36}. However, there might be networks where those special attractors do not necessarily correspond to the most relevant forms of biological control.
Our greedy selection algorithm scales cubically with the system size and exponentially with the maximum degree of the network, making it applicable to mediumsized networks as long as they are sufficiently sparse.
Other scalable methods exist to identify the nodes that most influence nonlinear dynamics^{16,40,47}, but these methods do not tell, in general, how to control the dynamics toward a specific attractor. Unlike logical inference methods^{16,40}, the IBMFA does not guarantee to find specific causal pathways that determine dynamical behavior, but it does allow to estimate the state of every node in the network by averaging over all possible configurations allowed by pinning the seeds. This allows for a detailed description of the network’s state from maximally uncertain initial conditions. This is also an advantage over structureonly methods which are scalable but may not predict well the dynamics of the network^{17}.
All methods developed in this paper can be immediately extended to deal with arbitrary updating schemes, e.g., deterministic asynchronous, stochastic asynchronous, and block deterministic updating schemes^{41,42,43}. While fixed points are invariant to the choice of the updating scheme, the size of the basin of attraction of a fixed point is generally affected by the specific rules of the dynamics at hand^{41,43}. This fact is apparent from the application of our methods too. For example, we find that unconstrained optimization leads to the identification of different fixed points for different updating schemes in the yeast Saccharomyces cerevisiae cellcycle network. On the other hand, results of our methods applied to the Drosophila melanogaster SPN are almost identical for all the updating schemes we considered. A better understanding of the robustness of minimal driver sets against the change of updating rules requires further investigation.
The methods developed in this paper suffer from some limitations. The computational time of the unconstrained optimization algorithm grows cubically with the system size. This is certainly an improvement over several existing methods, yet it allows for the analysis of relatively small systems only. Further, constrained optimization requires prior knowledge of the targeted fixed points. Acquiring information on all attractors generally requires a time that grows exponentially with the system size, generating a clear limitation to the applicability of our method. However, we note that many biological networks have known attractors related to specific phenotypes and our method can be applied to these without having to formulate the entire attractor landscape, which is an advantage over methods that require this calculation (e.g.,^{25,26}); as with these methods, attractors can be found via sampling for networks that are too large for exhaustive computation. Finally, as it is currently formulated, our method is useful for the study of fixed points only but not of limit cycles. The method can be generalized to the study of these more complicated attractors, but only via a nontrivial generalization of our currently proposed metric of dynamical influence.
In spite of the above limitations, there are some immediate extensions to this work that deserve future attention. For example, our methods can be easily adapted to networks with more than two states per node. Also, our definition of dynamical influence and our algorithm for the selection of optimal driver sets can be extended to study the effect of shortterm perturbations, i.e., the state of the seeds is only initially set to a given value but can be altered by the dynamics of the network. Finally, our method may be used in a variety of applications to approximate node influence in Boolean networks that are too large to calculate exact solutions.
Methods
Boolean networks
We consider a deterministic, multivariate, discretetime dynamical system whose interactions are represented as a graph \({{{{{{{\mathcal{G}}}}}}}}\) composed of N nodes. Full information about the network topology is contained in the N × N adjacency matrix A whose generic element A_{ij} = 1 if a connection between node i and node j exists, while A_{ij} = 0 otherwise. Please note that the network is directed, in the sense that, in general, A_{ij} ≠ A_{ji}. Selfloops are allowed. The network topology serves to specify dependencies among variables in the definition of the dynamical system. Specifically, we consider the case where, at the generic instant of time t, each node i has associated a binary state variable σ_{i}(t) = 0, 1, and the value of the variable σ_{i}(t) is fully determined by the value of the state variables of the neighbors of node i at time t − 1. We can write
where \({\vec{\sigma }}_{{{{{{{{{\mathcal{N}}}}}}}}}_{i}}(t1)=[{\sigma }_{{j}_{1}^{(i)}}(t1),\ldots ,{\sigma }_{{j}_{{k}_{i}}^{(i)}}(t1)]\) is the vector representing the configuration of the system restricted to the neighborhood \({{{{{{{{\mathcal{N}}}}}}}}}_{i}=\{{j}_{1}^{(i)},\ldots ,{j}_{{k}_{i}}^{(i)}\}=\{j\in {{{{{{{\mathcal{G}}}}}}}} {A}_{ji}=1\}\) of node i, where k_{i} is the indegree, or simply the degree, of node i. F_{i}( ⋅ ) is the binaryvalued activation function of node i and fully determines the rules of the dynamics of variable σ_{i}. Rules F_{i}( ⋅ )s are static and do not evolve in time.
Given an initial condition \(\overrightarrow{\sigma }(t=0)=[{\sigma }_{1}(t=0),{\sigma }_{2}(t=0),\ldots ,{\sigma }_{N}(t=0)]\), the dynamics of the system consists in iterating the deterministic rules of Eq. (1). All results reported in the main paper are obtained under the synchronous updating scheme where all variables are updated in a synchronous manner. The above formalization, however, applies also to other updating schemes, e.g., deterministic asynchronous, stochastic asynchronous, and block deterministic updating schemes^{41,42,43}. Depending on the functions F_{i}( ⋅ )s and the initial condition \(\overrightarrow{\sigma }(t=0)\), different longterm behaviors can be observed, including absorbing configurations and limit cycles. Fixed points of the network are insensitive to the specific updating scheme considered; however, the size of their basin of attraction is affected by it.
Individualbased meanfield approximation
Given a network and a set of Boolean functions describing the dynamics of the individual nodes, a bruteforce analysis of the system would require considering all possible configurations and the effect of the functions F_{i}( ⋅ )s on those configurations. This would allow to build a deterministic transition matrix between the 2^{N} possible configurations, thus providing a way to describe all possible trajectories of the system in terms of a statetransition graph. Clearly, such a bruteforce approach does not scale properly with the system size, and thus it is not very useful for systematic analyses. We propose here a way to approximate system dynamics by assuming that expectation values of the various dynamical variables are uncoupled. Specifically, we consider the probability s_{i}(t) = P(σ_{i}(t) = 1) and use Eq. (1) to write
Essentially, the probability s_{i}(t) = P(σ_{i}(t) = 1) that node i is found in the state σ_{i}(t) = 1 at time t is given by a sum over all possible \({2}^{{k}_{i}}\) input configurations for the function F_{i}( ⋅ ). Configurations are enumerated using k_{i} binary variables n_{j}. Among those configurations, nonnull contributions to the sum arise only when \({F}_{i}({\overrightarrow{n}}_{{{{{{{{{\mathcal{N}}}}}}}}}_{i}})=1\). This fact is encoded by the term \({\delta }_{1,{F}_{i}({\vec{n}}_{{{{{{{{{\mathcal{N}}}}}}}}}_{i}})}\), where we made use of the Kronecker function defined as δ_{x,y} = 1 if x = y and δ_{x,y} = 0 otherwise. We note that each configuration in the sum has a weight equal to a product of marginal probabilities, thus it is based on the approximation that the states of all nodes involved in the input configuration of the function F_{i}( ⋅ ) are independent of each other. For example, if node i has only three neighbors j, k, and i itself, the hypothetical configuration (n_{i} = 1, n_{j} = 0, n_{k} = 1) such that F_{i}(1, 0, 1) = 1 would correspond to a contribution equal to s_{i}(t − 1) [1 − s_{j}(t − 1)]s_{k}(t − 1) in the sum. Under the individualbased meanfield approximation (IBMFA), the probability of observing a configuration \(\overrightarrow{\sigma }(t)\) given the values of the probabilities \(\overrightarrow{s}(t)\) is
We note that if we set s_{i}(t) = σ_{i}(t) = 0, 1 for all nodes i in the network, then that is the only configuration with nonnull probability in Eqs. (3), and (2) reduces to Eq. (1).
Error metric for the individualbased meanfield approximation
Error of the IBMFA is determined by comparing the approximation to R simulations of network dynamics, each started from a random initial configuration obeying the probability distribution of Eq. (3). Specifically, we first estimate the average value
where \({\sigma }_{i}^{(r)}(t)\) indicates the state of node i at time t in the rth simulation. We then evaluate the mean squared error of the prediction as
with s_{i}(t) solution of the Eq. (2). The baseline value for the mean squared error of the IBMFA prediction is given by
Eq. (6) quantifies the variance of the sampled trajectories that are used to estimate the groundtruth average trajectory with Eq. (4).
Entropy of network configurations
We measure the uncertainty of a Boolean network as the normalized entropy of the probability distribution associated to its possible configurations, namely
with h_{2}(s) binary entropy function, i.e., \({h}_{2}(s)=s\,{\log }_{2}(s)(1s)\,{\log }_{2}(1s)\). Please note that Eq. (7) approximates the true entropy of the system from the above, as it assumes independence among the dynamical variables of the individual nodes. We note that \(H(\overrightarrow{s})\in [0,1]\). Maximum entropy is reached for s_{i} = 1/2 for all i. Null entropy is measured for deterministic configurations s_{i} = 0, 1 for all i.
Definition of seed set
We define the set of seed nodes \({{{{{{{\mathcal{X}}}}}}}}=\{({x}_{1},{\hat{\sigma }}_{{x}_{1}}),({x}_{2},{\hat{\sigma }}_{{x}_{2}}),\ldots ,({x}_{ {{{{{{{\mathcal{X}}}}}}}} },{\hat{\sigma }}_{{x}_{ {{{{{{{\mathcal{X}}}}}}}} }})\}\) as the set of nodes and their known, invariant, states, i.e., \({\sigma }_{i}(t)={s}_{i}(t)={\hat{\sigma }}_{i}=0,1\) for all \((i,{\hat{\sigma }}_{i})\in {{{{{{{\mathcal{X}}}}}}}}\) and for all t ≥ 0. Please note that we tacitly assumed that node i may contribute at max one element to the set \({{{{{{{\mathcal{X}}}}}}}}\), as either \((i,{\hat{\sigma }}_{i}=0)\) or \((i,{\hat{\sigma }}_{i}=1)\).
The state of all nodes not belonging to the set \({{{{{{{\mathcal{X}}}}}}}}\) is uncertain, i.e., 0 ≤ s_{i}(t) ≤ 1 for all \(i\notin {{{{{{{\mathcal{X}}}}}}}}\). Unless noted otherwise, we focus our attention to the case of maximal uncertainty for the initial state of the nonpinned nodes, i.e., s_{i}(0) = 1/2 for all \(i\,\notin\, {{{{{{{\mathcal{X}}}}}}}}\). The probability of starting from the configuration \(\overrightarrow{\sigma }(0)\) given \(\overrightarrow{s}(0)\) still obeys Eq. (3), with the additional constraint that \({s}_{i}(0)={\hat{\sigma }}_{i}\) for \(i\in {{{{{{{\mathcal{X}}}}}}}}\).
Influence maximization
We propose a greedy algorithm for the quasioptimal selection of the smallest set of nodes that should be pinned in order to control the dynamics of a Boolean network toward zero entropy. As in standard greedy optimization techniques, our strategy consists in pinning one node at each stage of the algorithm; the selected seed is the best choice that can be made at that particular stage of the optimization algorithm.
Indicate with \({{{{{{{{\mathcal{X}}}}}}}}}_{v}=\{({b}_{1},{\hat{\sigma }}_{{b}_{1}}),({b}_{2},{\hat{\sigma }}_{{b}_{2}}),\ldots ,({b}_{v},{\hat{\sigma }}_{{b}_{v}})\}\) the set of pinned nodes at the vth stage of the algorithm. We initialize the algorithm at stage v = 0 with \({{{{{{{{\mathcal{X}}}}}}}}}_{0}={{\emptyset}}\). Then, we set v = 1 and follow the procedure:

1.
Select the best seed \(({b}_{v},{\hat{\sigma }}_{{b}_{v}})\) of stage v of the algorithm according to
$$({b}_{v},{\hat{\sigma }}_{{b}_{v}})=\arg \mathop{\min }\limits_{(i,{\hat{\sigma }}_{i})\notin {{{{{{{{\mathcal{X}}}}}}}}}_{v1}}\,H(\overrightarrow{s}(T) {{{{{{{{\mathcal{X}}}}}}}}}_{v1}\cup (i,{\hat{\sigma }}_{i})).$$(8) 
2.
Add \(({b}_{v},{\hat{\sigma }}_{{b}_{v}})\) to the set of pinned nodes \({{{{{{{{\mathcal{X}}}}}}}}}_{v1}\), i.e., \({{{{{{{{\mathcal{X}}}}}}}}}_{v}={{{{{{{{\mathcal{X}}}}}}}}}_{v1}\cup ({b}_{v},{\hat{\sigma }}_{{b}_{v}})\).

3.
Increase v → v + 1, and go back to point 1.
The above algorithm is iterated until there is a certain v^{*} such that \(H(\overrightarrow{s}(T) {{{{{{{{\mathcal{X}}}}}}}}}_{{v}^{* }})=0\). In this case, the set of pinned nodes \({{{{{{{{\mathcal{X}}}}}}}}}_{{v}^{* }}\) is able to fully control the dynamics of the network toward a particular attractor. The set \({{{{{{{{\mathcal{X}}}}}}}}}_{{v}^{* }}\) is the optimal driver set according to our recipe. We note that the criterion of Eq. (8) prescribes the selection of the best seed as the one that, if added to the existing seed set, leads to the minimum resulting conditional entropy of the network. Conditional entropy is measured after T dynamical stages of the dynamics. In particular, we approximate it via the solution of the IBMFA Eq. (2) where we impose s_{i}(t = 0) = 1/2 for all nonpinned nodes.
The resulting driver set is postprocessed to eventually reduce its size. One node at a time is removed from the set as long as the resulting entropy is still equal to zero, i.e., the element \((i,{\hat{\sigma }}_{i})\in {{{{{{{\mathcal{X}}}}}}}}\) can be removed from the set of drivers \({{{{{{{\mathcal{X}}}}}}}}\) only if \(H(\overrightarrow{s}(T) {{{{{{{\mathcal{X}}}}}}}}\setminus (i,{\hat{\sigma }}_{i}))=0\). The postprocessing technique serves to improve suboptimal choices potentially made by the greedy optimization algorithm.
In order to select nodes to reach the attractor \({{\Theta }}=\{(1,{\tilde{\sigma }}_{1}),(2,{\tilde{\sigma }}_{2}),\ldots ,(N,{\tilde{\sigma }}_{N})\}\), the same greedy selection procedure as above is used except that the set of candidate elements are only those compatible with Θ, i.e., Eq. (8) is replaced by
A postprocessing technique to reduce the size of the driver set is used in a similar manner as described for the unconstrained greedy optimization algorithm.
The computational time of the algorithm scales cubically with the system size, i.e., O(N^{3}) (see Fig. S18). This fact is understood as follows. At stage v of the algorithm, one needs to evaluate the entropy \(H(\overrightarrow{s}(T) {{{{{{{{\mathcal{X}}}}}}}}}_{v1}\cup (i,{\hat{\sigma }}_{i}))\) appearing on the rhs of Eq. (8) for every of the N − v nodes i in the network that are not yet part of the seed set \({{{{{{{{\mathcal{X}}}}}}}}}_{v}\). Evaluating \(H(\overrightarrow{s}(T) {{{{{{{{\mathcal{X}}}}}}}}}_{v1}\cup (i,{\hat{\sigma }}_{i}))\) requires a time that grows as N as one needs to iterate the N IBMFA Eq. (2). To find a driver set, the algorithm is iterated v^{*} times, where v^{*} grows as N given that the size of a driver set is generally proportional to the system size.
Networks analyzed in the paper
Random Boolean networks
Random Boolean networks (RBNs) are extensively studied networks with wellknown theoretical properties. RBNs are special cases of Boolean networks where the connections between nodes and the transfer functions governing node update are random. We consider RBNs under synchronous update, as in traditional literature^{11,39,48}.
In our model, the network has N nodes and every node has exactly k neighbors; we further ensure that there are no isolated nodes. For k = 1, we use a directed ring structure; for k = 2, we use an undirected ring structure; for k ≥ 3, we generate random connections between the nodes. Activation functions are generated by assigning a random output value, either 0 or 1 with equal probability, to the function F_{i} of Eq. (1) for all nodes i irrespective of the 2^{k} possible arguments of the function.
Gene regulatory networks
Boolean and multistate networks have been used to successfully model biological processes such as cellfate determination, cellcycle regulation, and cancer development^{7,35,36,49}. Such gene regulatory networks (GRNs) characterize relevant components of a cell (e.g., a protein or gene) as nodes which are connected if one component has a regulatory (activating or inhibitory) effect on another. Boolean node states describe whether there is activity of that component above or below a relevant threshold. The attractors of the network mimic actual stable states of biological interest, such as wildtype or mutant phenotypes. These models are particularly useful in cases where kinetic parameters of biological components have not been established or where phenotypes of interest can be recovered without such parameters^{50}.
We utilize some of such models in this work. The first describes the body segmentation of Drosophila melanogaster^{34}. The singlecell segment polarity network (SPN) consists of N = 17 nodes, three of which are external signals to the cell which have no inputs themselves. This network has been well studied and all of its 10 attractors are known. The second model describes signal transduction in estrogen receptor (ER+) breast cancer^{35}. This network consists of N = 80 nodes, and its attractor landscape is too large to be fully described; however, the network contains several pathways of biological interest that can be manipulated by 7 external drug nodes. The activation of these nodes suggest that the drug is present, while their absence suggests that the drug is absent.
We also explore the Drosophila melanogaster parasegment network (N = 60), which is the equivalent of four interconnected singlecell SPN models where each cell has 15 nodes, some of which are dependent on neighboring cells and one which is an external signal^{34}. Although the complete attractor landscape of this network is largely unexplored, several biologically relevant attractors are known, representing the wildtype, wildtype variant, ectopic, ectopic variant, broad stripes, and no segmentation phenotypes. In addition, we analyze the T cell large granular lymphocyte (TLGL) leukemia network (N = 60), which describes T cell survival signaling in leukemia^{37}, and the yeast Saccharomyces cerevisiae cellcycle network (N = 12), which describes cellcycle regulation in budding yeast^{36}. Finally, we utilize the Cell Collective repository, an open source collection of biological signaling and regulatory networks^{38}, as accessed on August 5th, 2020.
All results of the main paper have been obtained assuming that the system evolves under synchronous updates. In the SI, we study the dynamical properties of the Drosophila melanogaster SPN and the yeast Saccharomyces cerevisiae cellcycle network considering deterministic asynchronous, stochastic asynchronous, and block deterministic updating schemes^{41,42,43}.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
Data availability
Network data considered in this paper have been downloaded from the publicly accessible repositories https://github.com/rionbr/CANA/tree/master/cana/datasets and https://cellcollective.org.
Code availability
The code developed for this paper is made available at https://doi.org/10.5281/zenodo.6581810^{51}.
References
Lü, L. et al. Vital nodes identification in complex networks. Phys. Rep. 650, 1–63 (2016).
Morone, F. & Makse, H. A. Influence maximization in complex networks through optimal percolation. Nature 524, 65–68 (2015).
Bai, W.J., Zhou, T. & Wang, B.H. Immunization of susceptible–infected model on scalefree networks. Physica A 384, 656–662 (2007).
Summer, M. Financial contagion and network analysis. Annu. Rev. Financ. Econ. 5, 277–297 (2013).
Li, S., Assmann, S. M. & Albert, R. Predicting essential components of signal transduction networks: a dynamic model of guard cell abscisic acid signaling. PLoS Biol. 4, e312 (2006).
Saadatpour, A. et al. Dynamical and structural analysis of a t cell survival network identifies novel candidate therapeutic targets for large granular lymphocyte leukemia. PLoS Comput. Biol. 7, e1002267 (2011).
Zañudo, J. G., Steinway, S. N. & Albert, R. Discrete dynamic network modeling of oncogenic signaling: Mechanistic insights for personalized treatment of cancer. Curr. Opin. Systems Biol. 9, 1–10 (2018).
Domingos, P. & Richardson, M. Mining the network value of customers. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 57–66 (2001).
Kempe, D., Kleinberg, J. & Tardos, É. Maximizing the spread of influence through a social network. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 137–146 (2003).
Erkol, Ş., Castellano, C. & Radicchi, F. Systematic comparison between methods for the detection of influential spreaders in complex networks. Sci. Rep. 9, 1–11 (2019).
Gershenson, C. Introduction to random boolean networks. In Workshop and Tutorial Proceedings, Ninth International Conference on the Simulation and Synthesis of Living Systems (ALife IX), 160–173 (2004).
Datta, A., Choudhary, A., Bittner, M. L. & Dougherty, E. R. External control in markovian genetic regulatory networks. Mach. Learn. 52, 169–191 (2003).
Datta, A., Choudhary, A., Bittner, M. L. & Dougherty, E. R. External control in markovian genetic regulatory networks: the imperfect information case. Bioinformatics 20, 924–930 (2004).
Akutsu, T., Hayashida, M., Ching, W.K. & Ng, M. K. Control of boolean networks: Hardness results and algorithms for tree structured networks. J. Theor. Biol. 244, 670–679 (2007).
Watts, D. J. A simple model of global cascades on random networks. Proc. Natl. Acad. Sci. USA 99, 5766–5771 (2002).
MarquesPita, M. & Rocha, L. M. Canalization and control in automata networks: body segmentation in drosophila melanogaster. PloS One 8, e55946 (2013).
Gates, A. J. & Rocha, L. M. Control of complex networks requires both structure and dynamics. Sci. Rep. 6, 1–11 (2016).
Liu, Y.Y. & Barabási, A.L. Control principles of complex systems. Rev. Mod. Phys. 88, 035006 (2016).
Cheng, D. & Qi, H. Controllability and observability of boolean control networks. Automatica 45, 1659–1667 (2009).
Liu, Y.Y., Slotine, J.J. & Barabási, A.L. Controllability of complex networks. Nature 473, 167–173 (2011).
Fiedler, B., Mochizuki, A., Kurosawa, G. & Saito, D. Dynamics and control at feedback vertex sets. i: Informative and determining nodes in regulatory networks. J. Dyn. Differ. Equ. 25, 563–604 (2013).
Mochizuki, A., Fiedler, B., Kurosawa, G. & Saito, D. Dynamics and control at feedback vertex sets. ii: A faithful monitor to determine the diversity of molecular activities in regulatory networks. J. Theor. Biol. 335, 130–146 (2013).
Zañudo, J. G. T., Yang, G. & Albert, R. Structurebased control of complex networks with nonlinear dynamics. Proc. Natl. Acad. Sci. USA 114, 7234–7239 (2017).
Zanudo, J. G. & Albert, R. Cell fate reprogramming by control of intracellular network dynamics. PLoS Comput. Biol. 11, e1004193 (2015).
Kim, J., Park, S.M. & Cho, K.H. Discovery of a kernel for controlling biomolecular regulatory networks. Sci. Rep. 3, 1–9 (2013).
Borriello, E. & Daniels, B. C. The basis of easy controllability in boolean networks. Nat. Commun. 12, 1–15 (2021).
PastorSatorras, R., Castellano, C., Van Mieghem, P. & Vespignani, A. Epidemic processes in complex networks. Rev. Mod. Phys. 87, 925 (2015).
Derrida, B. & Pomeau, Y. Random networks of automata: a simple annealed approximation. EPL–Europhys. Lett. 1, 45 (1986).
Kochi, N. & Matache, M. T. Meanfield boolean network model of a signal transduction network. Biosystems 108, 14–27 (2012).
Andrecut, M. Mean field dynamics of random boolean networks. J. Stat. Mech.–Theory E. 2005, P02003 (2005).
Seshadhri, C., Vorobeychik, Y., Mayo, J. R., Armstrong, R. C. & Ruthruff, J. R. Influence and dynamic behavior in random boolean networks. Phys. Rev. Lett. 107, 108701 (2011).
Joy, M. P., Ingber, D. E. & Huang, S. Chaotic mean field dynamics of a boolean network with random connectivity. Int. J. Mod. Phys. C 18, 1459–1473 (2007).
Nemhauser, G. L., Wolsey, L. A. & Fisher, M. L. An analysis of approximations for maximizing submodular set functionsi. Math. Program. 14, 265–294 (1978).
Albert, R. & Othmer, H. G. The topology of the regulatory interactions predicts the expression pattern of the segment polarity genes in drosophila melanogaster. J. Theor. Biol. 223, 1–18 (2003).
Zañudo, J. G. T., Scaltriti, M. & Albert, R. A network modeling approach to elucidate drug resistance mechanisms and predict combinatorial drug treatments in breast cancer. Cancer Convergence 1, 5 (2017).
Li, F., Long, T., Lu, Y., Ouyang, Q. & Tang, C. The yeast cellcycle network is robustly designed. Proc. Natl. Acad. Sci. USA 101, 4781–4786 (2004).
Zhang, R. et al. Network model of survival signaling in large granular lymphocyte leukemia. Proc. Natl. Acad. Sci. USA 105, 16308–16313 (2008).
Helikar, T. et al. The cell collective: toward an open and collaborative approach to systems biology. BMC Syst. Biol. 6, 1–14 (2012).
Kauffman, S. A. Metabolic stability and epigenesis in randomly constructed genetic nets. J. Theor. Biol. 22, 437–467 (1969).
Gates, A. J., Correia, R. B., Wang, X. & Rocha, L. M. The effective graph reveals redundancy, canalization, and control pathways in biochemical regulation and signaling. P. Natl. Acad. Sci. 118, e2022598118 (2021).
Goles, E., Montalva, M. & Ruz, G. A. Deconstruction and dynamical robustness of regulatory networks: application to the yeast cell cycle networks. B. Math. Biol. 75, 939–966 (2013).
Aracena, J., Fanchon, E., Montalva, M. & Noual, M. Combinatorics on update digraphs in boolean networks. Discrete Appl. Math. 159, 401–409 (2011).
Fauré, A., Naldi, A., Chaouiya, C. & Thieffry, D. Dynamical analysis of a generic boolean model for the control of the mammalian cell cycle. Bioinformatics 22, e124–e131 (2006).
Radicchi, F. & Castellano, C. Uncertainty reduction for stochastic processes on complex networks. Phys. Rev. Lett. 120, 198301 (2018).
Braunstein, A., Dall’Asta, L., Semerjian, G. & Zdeborová, L. Network dismantling. Proc. Natl. Acad. Sci. USA 113, 12368–12373 (2016).
Osat, S., Faqeeh, A. & Radicchi, F. Optimal percolation on multiplex networks. Nat. Commun. 8, 1–7 (2017).
Zañudo, J. G. & Albert, R. An effective network reduction approach to find the dynamical repertoire of discrete dynamic networks. Chaos 23, 025111 (2013).
Kauffman, S. A. et al. The Origins of Order: Selforganization and Selection in Evolution. (Oxford University Press, USA, 1993).
EspinosaSoto, C., PadillaLongoria, P. & AlvarezBuylla, E. R. A gene regulatory network model for cellfate determination during arabidopsis thaliana flower development that is robust and recovers experimental gene expression profiles. The Plant Cell 16, 2923–2939 (2004).
Albert, R. & Thakar, J. Boolean modeling: a logicbased dynamic approach for understanding signaling and regulatory networks and for making useful predictions. Wires Syst. Biol. Med. 6, 353–369 (2014).
Parmer, T., Rocha, L.M. & Radicchi, F. https://doi.org/10.5281/zenodo.6581810 (2022).
Acknowledgements
This material is based upon work supported by the Air Force Office of Scientific Research under award number FA95502110446 (T.P. and F.R.) and by the National Institutes of Health, National Library of Medicine Program, grant 01LM01194501 (L.M.R.).The funders had no role in study design, data collection and analysis, decision to publish, or any opinions, findings, and conclusions or recommendations expressed in the manuscript.
Author information
Authors and Affiliations
Contributions
T.P., L.M.R., and F.R. conceived and designed the experiments. T.P. performed the experiments. T.P., L.M.R., and F.R. wrote the paper.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Gonzalo A. Ruz, Bryan Daniels, and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Parmer, T., Rocha, L.M. & Radicchi, F. Influence maximization in Boolean networks. Nat Commun 13, 3457 (2022). https://doi.org/10.1038/s41467022310660
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467022310660
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.