An efficient algorithm to identify the optimal one-bit perturbation based on the basin-of-state size of Boolean networks

Boolean networks are widely used to model gene regulatory networks and to design therapeutic intervention strategies to affect the long-term behavior of systems. In this paper, we investigate the less-studied one-bit perturbation, which falls under the category of structural intervention. Previous works focused on finding the optimal one-bit perturbation to maximally alter the steady-state distribution (SSD) of undesirable states through matrix perturbation theory. However, the application of the SSD is limited to Boolean networks with about ten genes. In 2007, Xiao et al. proposed to search the optimal one-bit perturbation by altering the sizes of the basin of attractions (BOAs). However, their algorithm requires close observation of the state-transition diagram. In this paper, we propose an algorithm that efficiently determines the BOA size after a perturbation. Our idea is that, if we construct the basin of states for all states, then the size of the BOA of perturbed networks can be obtained just by updating the paths of the states whose transitions have been affected. Results from both synthetic and real biological networks show that the proposed algorithm performs better than the exhaustive SSD-based algorithm and can be applied to networks with about 25 genes.

The ultimate objective of gene-regulatory-network modeling and analysis is to design effective intervention strategies so that the dynamics of the network evolves toward desirable cellular states. Following the two seminal papers by Shmulevich in 2002 4,5 , the control of gene regulatory networks has been intensively studied within the framework of PBNs. Current control strategies can be classified into two categories: external control and structural intervention. External control "persuades" the system to move toward desirable states by flipping (or not flipping) the value of a control gene or genes over time. In biology, an example of external control is the activation of the well-known tumor-suppressor gene p53 in response to radiation which can rapidly inhibit cell growth and lead to apoptosis in a few hours 5 . The optimal control policies aim to maximally increase the steady-state distribution of desirable states and decrease that of undesirable states. Solving the optimal control problem requires applying a dynamic programming algorithm. However, it quickly becomes computationally infeasible as network size increases because the size of the search space for this optimization problem is O(2 n ). Several approximate and greedy algorithms have been proposed to find suboptimal solutions, such as the mean-first-passage-time control policy, the BOA control policy, the SSD control policy, and the conservative SSD control policy 6 . All these policies aim to reduce the risk of entering undesirable states that correspond to aberrant phenotypes of the modeled cells by some heuristic criterion. Recently, Yousefi et al. proposed to solve the optimal control problem by a linear-programming approach. Depending on whether desirable states are constrained, they presented the unconstrained optimal-intervention policy and the phenotypically constrained optimal-intervention policy, which can lead to the maximal phenotype alteration 7 .
Structural intervention directly changes the underlying network structure (wiring) to alter the long-term behavior (steady state) of the network 8 . In biology, this can be accomplished by introducing a transcription factor or drug that may affect the state of the target genes in some specific situations. For example, in developing countries, estrogen is often taken by women after menopause to slow down the aging process. However, the estrogen dose is critical because an overdose may increase the risk of developing breast or ovarian cancer. The simplest structural-intervention strategy is one-bit perturbation, which alters one output bit of the Boolean function for a specific gene. After a function perturbation, the original state transition matrix and SSD are both changed. The optimal structural intervention is to determine which perturbation on the truth table governing a BNp (where BNP is a Boolean network with perturbation p) or PBN would result in the maximal long-term effect on the dynamic behavior of the network. In 2007, Xiao et al. first studied how perturbation functions impact the BOA of Boolean networks 9 . In 2008 and 2012, Qian et al. adapted perturbation theory in finite Markov chains to derive an analytic solution to compute the shifted mass after a function perturbation 10,11 . In 2011, Bouaynayal formulated optimal intervention as an inverse-perturbation problem that can be solved by using standard convex optimization methods 12 . Although these approaches can find the optimal one-bit perturbation, their application is still limited to networks with about 10 genes. For example, the method in ref. 12 can only deal with BNs consisting of 10 to 15 genes, even when applying the current-best semi-definite program solvers. The main obstacle for methods based on matrix perturbation theory is that operations on the probability transition matrix (2 n × 2 n ) are too time consuming. Structural intervention can permanently change the SSD of the network because the fundamental functions have been changed. However, external intervention changes the SSD only when the control policies are performed. From this point of view, structural intervention has a greater potential impact on the dynamic behavior of networks than external intervention, and gene therapies are more likely to be developed by applying drug or genetic manipulation to alter extant cell behavior.
The probability mass of the SSD of BNs is mainly occupied by their attractor states. To avoid matrix operations, we can directly find the one-bit perturbation that maximally increases the BOA of desirable attractors and reduces the BOA of undesirable attractors. Xiao et al. proposed several algorithms to determine the optimal perturbation function based on the change in the size of the BOA 9 . However, their algorithms are very cumbersome and require closely observing the state transition changes before and after a perturbation. In this paper, we propose an algorithm that quickly determines the size of the BOA after a one-bit perturbation based on the basin of states (BOS) for each state.
The rest of this paper is organized as follows: The definitions, problem setting, and algorithm description are given in Section 2. The experimental results and discussion on both synthetic networks and real biological networks appear in Section 3. Finally, concluding remarks are given in Section 4.

Boolean networks and Probabilistic Boolean Networks.
Each node x i represents the expression state of a gene, where x i = 0 means that the gene is off and x i = 1 means that it is on. To update the node value, each node x i is assigned a Boolean function with k i specific input nodes. Under the synchronous-updating scheme, all genes are updated simultaneously according to their corresponding update functions. The network's state at time t is denoted by a binary vector x(t) = (x 1 (t), … , x n (t)). In the absence of noise, the state of the system at the next time step is In Boolean networks, there are generally two kinds of attractors: singleton attractors and cyclic attractors. The former consists of only one stable state while the latter consists of multiple stable states. Figure 1(A) shows a BN consisting of five genes and the corresponding truth table of all genes. Figure 1 of BNs with a selection probability c i (1 ≤ i ≤ N). At each time, there is a switching probability to determine whether to select a constitute BN as the governing BN. To incorporate the randomness of gene activity, a small perturbation probability p > 0 is assigned to each gene. This random perturbation allows all states of a PBN to communicate with each other, thereby resulting in an ergodic Markov chain with a SSD π π π = … − { , , } 0 2 1 n . The long-term behavior of PBNs is characterized by this SSD. The simplest PBN is a Boolean network with perturbation p (BNp) 2 . A BNp inherits the attractor structure from the original BN without perturbation, the difference is that the random perturbation allows a BNp jump out of the BOA of an attractor and then evolves into the BOA of another attractor.
One-bit perturbation. One-bit perturbation is the simplest function perturbation that flips the output value for a specific input of a Boolean function. As a function perturbation changes some of the state transitions, the BOA of some attractors may enlarge or shrink, some attractors may disappear, and some new attractors may appear. If gene i is regulated by k i genes, then there are − 2 n k i system states in which the input value for gene i is equal to a specific value ). This means that a one-bit perturbation on this input value will lead to − 2 n k i state-transition changes where the state of gene i is flipped. The detailed proof of this conclusion appears in ref. 9.
. Problem setting. Attractors represent the fixed points of dynamical systems and capture the system's long-term behavior, so modifying attractor states may lead to severe side effects on the biological system in question. Therefore, to be prudent we should avoid one-bit perturbations that lead either to the disappearance of attractors or to the emergence of new attractors. But it should be prudent to modify the fundamental landscaps.
In biology, cancer can be characterized by the imbalance between cellular states (attractors), such as proliferation and apoptosis (programmed cell death). The objective of intervention is to keep cells away from certain states (for example, metastatic cancerous phenotypes). In order to avoid unpredicted side effects on the biological system in question, it should be cautious to change the fundamental landscape of the state transitions, namely, the system's attractor structure. In this paper, we restricted the one-bit perturbations that lead neither to the disappearance of attractors nor to the emergence of new attractors. Given the Boolean network BN with attractors A 1 , A 2 , … , A m , the first challenge is to identify such one-bit perturbations and avoid applying them.
The next challenge is to find the optimal intervention from the candidate one-bit perturbations in terms of maximizing the size of the BOA of expected attractors and minimizing that of unexpected attractors. As in previous studies, we classify the attractors into a desired attractor set D and an undesired attractor set U according to some target genes. The objective function to be maximized is denote the size of the BOA for attractor A l before and after perturbing Algorithm. To keep attractors unchanged, we must first identify the one-bit perturbations that may destroy the original attractors. Consider the singleton attractor x 1 x 2 x 3 x 4 x 5 = 00100 for the BN in Fig. 1(A), the corresponding input value for gene x 1 is x 5 x 2 x 4 = 000. The one-bit perturbation f 1 (1) will lead the next state of 00100 being 10100, instead of being itself. Therefore, we should avoid considering this one-bit perturbation. By repeating this for other genes, we also find that the one-bit perturbations of f 2 , and f 5 (1) should also be avoided. By applying this process to all attractor states, we identify the one-bit perturbations that may destroy the attractors. In the truth table in Fig. 1(A), those one-bit perturbations have been marked by red color. However, some one-bit perturbations may lead to new attractors. Such perturbations are difficult to identify, so we leave this problem to be solved in the BOS-updating process. If we encounter a new attractor in the perturbed BN, we stop searching with the current one-bit perturbation (see Algorithm 3, lines 42-47). The pseudocode for identifying the one-bit perturbations that may affect attractor states is summarized in Algorithm 1. Like the definition of BOA, we define the BOS of state s to be the transient states that can reach it. If state s is an attractor state, its BOS refers to those transient states which directly reach it, not including those transient states which reach it through other attractor states in the same cyclic attractor. For example in Fig. 1(B) to its attractor A l . In the traversal process, the BOS size of the visited state s′ ∈ s 1 s 2 … s l−1 a is incremented by 1 at each step. Two situations must be considered when arriving at state a: If A l has previously been identified as an attractor, the traversal process should stop at state a. If A l has not been identified as an attractor, the traversal process continues. When state a is revisited, the new attractor A l can be identified. Finally, we should decrement by 1 the BOS size of the attractor states in A l (apart from a itself). The pseudocode for determining the BOS size for all states and the attractors is summarized in Algorithm 2. After identifying the BOS of all 2 n states and the attractors in a BN, the exhaustive strategy simply repeats Algorithm 2 to determine the size of the BOA for the perturbed networks. We propose herein an algorithm that uses the BOS size obtained in Algorithm 2 to determine the size of the BOA after a one-bit perturbation. Given a one-bit perturbation f i p ( ) , we assume S p denotes the set of − 2 n k i states whose transitions are changed. Our idea is that the ultimate BOS can be obtained by sequentially updating the path of each of the − 2 n k i states. For each state s ∈ S p , the updating includes two processes: the SUB process and the ADD process. The SUB process updates the BOS size for all states in the current path P S , whereas the ADD process updates the BOS size for all states in the modified path ′ P S . The updating process may produce some cycles. If the cycles are emerged cyclic attractors in the perturbed BN′ , we should stop at once as the new attractor appears; otherwise, they are just temporary cycles formed in the updating process. The temporary cycles make the updating process more complicated. We now apply the network in Fig. 2 to illustrate the updating process in detail. The original BN contains eight states and one singleton attractor 111. The perturbation is implemented in the second bit of the function for the third gene. This intervention leads to a change in the state transitions of states 001 and 011.

Algorithm 2: Determine the basin of state (BOS) and the attractors set (AS
For the SUB process, we first update the transition of state 001, which changes from state 111 to 011. Its current evolution path P S is 001 → 010 → 111 [see Fig. 3(A)]. The BOS size of other states in P S is simply updated by BOS(s′ ) − BOS(s); that is, the BOS sizes of 010 and 111 become 1 and 3, respectively [see Fig. 3(B)]. Next, we consider the SUB process for the transition of state 011. Its current path P S is a temporary cycle 011 → 110 → 001 → 011 [see Fig. 3(D)]. In this case, its new transition destroys the cycle and leads to other states entering its BOS. Note that the BOS of state 110 does not change because the current path P S is a cycle. The updating process for the BOS takes the following form: we first add BOS(110) to the BOS size of state 001, and then add the new BOS(001) to the BOS size of state 110 [see Fig. 3(E)].
For the ADD process, we also consider first the transition of state 001 to 011. This transition transforms the modified path ′ P S into a temporary cycle 001 → 011 → 110 → 001 [see Fig. 3(C)]. The consequence of the temporary cycle is that state 110 is no longer in the BOS of state 001, and state 011 is no longer in the BOS of state 110. Therefore, the BOS-updating process takes the following form: we first subtract BOS(110) from the BOS size of state 001, and then we subtract BOS(011) from the BOS size of state 110. Next, we consider the ADD process for the transition of state 011. Its modified path ′ P S is 011 → 111, which directly enters an attractor. The BOS-updating process simply adds BOS(011) to other states in ′ P S [see Fig. 3(F)]. The pseudocode to determine the updated-BOS size for all states is summarized in Algorithm 3.    Fig. 1 under a one-bit perturbation f 1 (3) . This perturbation affects the transitions of four states: 01000, 01100, 11000, and 11100. The first column lists the 32 states, and the second column is the BOS of each state in the original BN. The following four columns present the process for updating the BOS size according to the order of the four states. The updating of the current path P S is highlighted in green and that of the modified path ′ P S is highlighted in yellow. The last column is the resulting BOS size for all states. This table shows that the updating process visits very few parts of the whole state space. Therefore, it is less time consuming than the exhaustive strategy.
Scientific RepoRts | 6:26247 | DOI: 10.1038/srep26247 Based on algorithms 2 and 3, the workflow of our BOS-based algorithm to identify the optimal one-bit perturbation includes the following main steps: Step 1: Apply Algorithm 2 to build the BOS size of all states in the original BN and identify its attractors.
Step 2: Apply Algorithm 1 to identify the one-bit perturbations that may destroy the original attractors.
Step 3: For a potential one-bit perturbation f i p ( ) , apply Algorithm 3 to obtain the updated BOS size of all states.
Step 4: Calculate Δ B for each feasible one-bit perturbation.
Step 5: Repeat Step 3 for other potential perturbations and find the perturbation that gives the maximal Δ B.

Results and Discussion
In this section, we compare the SSD algorithm, the exhaustive algorithm, and the proposed BOS-based algorithm both in simulated networks and in two biological networks. All calculations use the PBN Toolbox (http://code. google.com/p/pbn-matlab-toolbox/), which calculates the SSD. We implemented both the exhaustive and the  Simulation on synthetic networks. To compare the performance of the three algorithms on synthetic networks, we randomly generate 20 BNs for each number n of genes (n = 5, … , 25) and the average indegree K (K = 3, 4). The perturbation probability of each gene used to calculate the SSD is p = 0.001. Figure 5 shows the average time of the three algorithms for a one-bit perturbation. The solid lines are for K = 3 and the dashed lines are for K = 4.
First, the BOS-based algorithm performs best while the SSD-based algorithm performs worst. In particular, the time needed to calculate the SSD increases with the number n of genes more quickly than the time required to directly calculate the BOA size of all attractors. This is apparent by observing the slope of the lines in Fig. 5.
Second, the ratio of the average computing time for the exhaustive algorithm to that for the BOS-based algorithm is almost constant. This result is reasonable because, if the average indegree is K, then the BOS-based algorithm only needs to update the evolution path of the 2 n−K states whose transitions have been changed, whereas the exhaustive algorithm has to do the same for all 2 n states.
Third, the average indegree K does not influence the performance of the SSD algorithm. However, it affects the performance of both the exhaustive and the BOS-based algorithms. Concerning the exhaustive algorithm, the average computing time for K = 4 is slightly longer than that for K = 3. One major reason for this result is that the average length from a state to its attractor for BNs with K = 4 is longer than for BNs with K = 3. Therefore, it takes longer to determine the size of the BOA for K = 4 than for K = 3 13 . Concerning the BOS-based algorithm, the effect of the average indegree is completely the opposite because 2 n−K states are affected by a one-bit perturbation. Therefore, given a specific number n of genes, the BOS-based algorithm needs to update fewer states for BNs with K = 4 than for BNs with K = 3.
Apart from the time issue, the space complexity of the SSD is O(2 n × 2 n ), whereas that of the other two algorithms is only O(2 n ). Therefore, the proposed BOS-based algorithm can be applied to larger BNs than the other two algorithms. Figure 5 shows that the proposed BOS-based algorithm takes about 80 seconds for BNs with 25 genes whereas the exhaustive method takes about 300 seconds even for BNs with 20 genes.
Biological networks. In this section, we apply the three algorithms to two biological networks: the metastatic melanoma network 14 and the T-helper network 15 . The metastatic melanoma network contains seven key genes: WNT5A, pirin, S100P, RET1, MART1, HADHB, and STC2, which are labeled x 1 , … , x 7 . It has four singleton attractors: 0101111, 0110110, 0111110, and 1000001. This network was used in previous works to illustrate the effectiveness of various intervention strategies. Increasing the levels of the Wnt5a protein through a melanoma cell line is believed to alter the competence of the cell. Therefore, attractor 1000001 is undesirable, whereas the others are desirable. The T-helper network has 23 key genes: GATA3, IFN-ß, IFN-ßR, IFN-γ , IFN-γ R, IL-10,  IL-10R, IL-12, IL-12R, IL-18, IL-18R, IL-4, IL-4R, IRAK, JAKI, NFAT, SOCS1, STAT1, STAT3, STAT4, STAT6, T-bet, and TCR, which are labeled x 1 , … , x 23 . The average indegree of this network is K = 1.7. This network has 33 single attractors, which can be classified as 16 undesirable attractors and 17 desirable attractors: x 1 = 0 and x 1 = 1, respectively.  Figure 6 presents the size of the BOA of the two networks before and after the optimal one-bit perturbation identified by the BOS-based algorithm. Red bars denote the size of the BOA for the undesirable attractors and black bars denote the size of the BOA for desirable attractors. First, the identified optimal one-bit perturbations retain the attractors of the networks. Second, the optimal one-bit perturbation obviously reduces the size of the BOA of the undesirable attractors. Concerning the metastatic melanoma network, the optimal one-bit perturbation identified by the BOS-based algorithm is f 4 (4) , which can both retain the attractors of the system and maximally reduce the size of the BOA of undesirable attractors. This result is consistent with previous studies in ref. 10.
To identify the optimal one-bit perturbation, Table 1 presents the computing time for the three algorithms. The time for the SSD-based algorithm is not presented because it is impossible to deal the state-transition matrix of size 2 23 × 2 23 with currently available computers. The exhaustive algorithm takes about 20 hours to identify the optimal one-bit perturbations, whereas the proposed BOS-based algorithm takes only about three minutes.

Conclusions
From the point of view of biology, structural intervention has the potential to permanently alter the dynamic behavior of gene regulatory networks, and then push the system so that it evolves in a desirable direction. Although previous studies applying matrix perturbation theory ensure that the perturbation obtained is the optimal one-bit perturbation that can maximally shift the probabilistic mass of the SSD toward desirable states, its application is limited to networks with only about 15 genes.
Although Xiao et al. proposed to directly observe the changes in the size of the BOA before and after a function perturbation, they did not give a systematic computational algorithm that can efficiently solve this problem. In this paper, we propose an algorithm that efficiently determines the size of the BOA after a one-bit perturbation. To avoid unexpected side effects, we require that the optimal one-bit perturbation maintain the same network attractors. Our motivation is that, if we know the BOS size of all states, then the modified size of the BOA of all attractors after a one-bit perturbation of f i p ( ) can be obtained by updating only the evolution path of the − 2 n k i states whose transition have been altered. Results obtained from both synthetic networks and two real biological networks show that the proposed BOS-based algorithm is more efficient than both the exhaustive algorithm and the SSD-based algorithm. According to the results of the numerical experiments, the proposed BOS-based algorithm can be applied to networks with 25 genes. Finally, note that the time complexity of the proposed BOS-based algorithm is still exponential. To solve the optimal-intervention problem for larger BNs, we must study other methods to determine the size of the BOA of perturbed networks quickly.