The basis of easy controllability in Boolean networks

Effective control of biological systems can often be achieved through the control of a surprisingly small number of distinct variables. We bring clarity to such results using the formalism of Boolean dynamical networks, analyzing the effectiveness of external control in selecting a desired final state when that state is among the original attractors of the dynamics. Analyzing 49 existing biological network models, we find strong numerical evidence that the average number of nodes that must be forced scales logarithmically with the number of original attractors. This suggests that biological networks may be typically easy to control even when the number of interacting components is large. We provide a theoretical explanation of the scaling by separating controlling nodes into three types: those that act as inputs, those that distinguish among attractors, and any remaining nodes. We further identify characteristics of dynamics that can invalidate this scaling, and speculate about how this relates more broadly to non-biological systems.

The development of a comprehensive theory for control of complex biological systems is a major goal of systems biology [1,2]. While many results are available in control theory, its common assumptions have needed to be updated for application in biology: biological processes are typically intrinsically nonlinear [3] and we often care only about coarse-grained output states [4] as opposed to the ability to force a system into any possible state [5].
In the context of Boolean models of gene regulatory networks, a particularly useful concept in the static control of regulatory logic is the control kernel (CK). Introduced in [6], the CK is defined as a minimal set of genes such that external control of their expression is sufficient to steer the network dynamics toward a desired steady gene activation pattern (attractor).
Existing literature on Boolean networks contains a number of related definitions of minimal control sets and efficient approaches to finding them. First, stable motif analysis similarly defines control sets by identifying positive circuits in the network that can self-sustain an associated state (namely, trap spaces in the dynamics) [7]. Individual cyclic attractors are not determined by the method, but grouped into "quasi" attractors that share the same stationary parts. Otherwise, sets of control nodes are defined for quasi-attractors in the same way as control kernels. Alternatively, a control set can be defined as a single set of nodes that can force the system to any of the original steady states. Feedback vertex sets, namely minimal sets of nodes whose removal deprives the network of all its directed cycles, efficiently produce an upper bound on minimal control set size that does not require knowing the specific dynamics governing each node [8,9,5]. Finally, the method of differentially expressed positive circuits begins with minimal information about the network structure to efficiently find sets of nodes that, when forced, move the system from one of the network's attractors to another of its attractors [10].
Here, we focus on the CK definition of control, such that a CK particular to one of the original steady states produces that state regardless of the initial condition. Using this definition, and focusing on the particular steady state most often reached when starting from random initial conditions, Kim et al. found empirically in 8 representative models that the size of the CK is both uncorrelated with the network size and small [6]. This is in agreement with both the aforementioned existing theoretical approaches and an increasing wealth of empirical findings in cellular reprogramming experiments [11,12,13,14,15,16,17], where overexpression of fewer than a dozen transcription factors is capable of selecting a desired phenotypic behavior in systems with tens of thousands of genes. If controlling a network is not an increasingly difficult task as the size of the network increases, then the question arises of why some networks are more amenable than others to external control.
Naively, one might guess that the size of CKs would be close to the logarithm of the number of original attractors in a system. If the possible states of individual nodes are roughly equally likely given the states of other nodes, then forcing one node into a particular state will typically cut the number of attainable states in half. Then by controlling c nodes we expect to be able to narrow 2 c possible states down to 1 possible state. Yet the problem of finding the minimal set of controlling nodes is nontrivial, proven to be NP-hard by Akutsu et al. 1 [18], so that no algorithm is expected to run faster in the worst case than checking every possible subset of increasing size.
Here, across 49 example biological networks, we compute control kernels for all attractors. We first corroborate earlier results that CKs remain relatively small, finding typical CK sizes smaller than about a dozen nodes even in large networks with up to 72 interacting components.
More importantly, we illuminate the origins of this easy controllability by showing that the average CK sizes do in fact scale logarithmically with the number of attractors. For a number of reasons, this close correlation with the number of attractors has not been highlighted in prior studies. In particular, the scaling is not as clearly evident when looking only at the attractor most often reached from random starting points [6], nor when bounding the control set using the feedback vertex set, which does not require solving for individual attractors [5].
The theoretical justification for the observed scaling is subtle, and we progress here by breaking down the CK problem into three subproblems, one being the witness set problem from discrete mathematics. The lower bound realized by the witness set typically provides a good first approximation to the full size of the CK, which connects the observed logarithmic scaling to known results in computational learning theory.

Boolean Networks and Attractor Dynamics
This section defines common terms and concepts from the study of Boolean dynamical systems. Readers familiar with Boolean network dynamics can proceed to the next section.
Given their abstract nature, the applicability of Boolean networks extends far beyond the mathematical description of biological regulatory networks (cfr. [19,20,21,22], to cite just a few examples of a large literature). In the context of theoretical biology, their relevance was first highlighted by Kauffman [23], who identified cell types with the attractors of network dynamics, allowing for their mathematical study. The success of Boolean networks in theoretical biology can be attributed to their simplicity, forming the most parsimonious mathematical representation displaying systemic properties of real biological networks [24,25].
The state of a Boolean network at time t is defined by the state of its n nodes, x i (t), with i = 1, . . . , n. The Boolean nature of the model means that there are only two possible states for node i: ON, which corresponds to x i (t) = 1, or OFF, with x i (t) = 0. When the network describes cell regulation, its nodes are often representative of interacting genes, and x i (t) is the Boolean approximation of the expression level of gene i, with the gene being expressed when x i (t) = 1, and inactive otherwise.
The dynamics of a deterministic Boolean network is defined by n Boolean functions f 1 , . . . , f n whose input is the x 1 (t), . . . , x n (t) array, and whose output is the configuration of the network at time t + 1: x 1 (t + 1) = f 1 (x 1 (t), . . . , x n (t)) · · · x n (t + 1) = f n (x 1 (t), . . . , x n (t)) . (1) The form of these Boolean functions can be determined by experiments [26,27]. This parameterization corresponds to a synchronous update of every node in the network. Asynchronous dynamics are more realistic but less mathematically tractable [28] (see Supplemental Material for a discussion of our approach to asynchronous updating). With n nodes, there are 2 n possible network configurations (i.e. gene expression patterns in the case of a genetic network). We will refer to these configurations as states of the network, and to the space of all possible network states as the configuration space. The dynamics of the network is represented by a time series of states. Though large, the size of the configuration space is finite. Therefore, whenever the functions f i are deterministic, these dynamical paths will eventually converge to either a fixed state or a cycle of states. These special sets of states are the attractors of the Boolean network. The set of initial states that converge to a given attractor is known as its basin of attraction. In the case of gene regulatory networks, the core assumption of Kauffman's proposal is the identification between attractors and cell types. Boolean networks can have many distinct attractors, making them ideal for describing how the gene interactions encoded within a single fertilized egg can give rise to a large number of stable cell types [29]. Regardless of their specific interpretation, the attractors of a Boolean network are typically among its most important features, as they correspond to the only possible steady-state dynamics once any transient behavior has ended. The remainder of this manuscript is devoted to the study of how steady-state dynamics are altered when external control is exerted over the network.

Logarithmic Scaling of Control Kernel Size
We will assume here that external control is exerted over the network by setting the state of one or more of its nodes to constant values (also referred to as "node state override" [5]). This is done regardless of the updating rules associated to those nodes and the inputs provided by the remaining nodes. This pinning of p nodes is formally a map between dynamical systems, turning a network with n nodes into a "controlled" network with n − p nodes. We define a CK as the minimal amount of pinning necessary to force the system into a desired attractor: Definition 1 (Control Kernel): For a given attractor A, a control kernel is defined as a set of nodes of minimal order whose pinning reshapes the dynamics such that the basin of attraction of A becomes the entire configuration space.
By this definition, for a given A multiple CKs of the same size may exist, or there may be no existing CK. The most naive approach to finding a CK for a desired attractor A is simple but time-consuming: pin every possible subset of nodes of increasing size to their values in A and test the network's dynamics to determine whether all initial states of the network converge to A. In the case of networks with highly modular organization, it is possible to be much more efficient by computing CKs for each module separately (see Methods).
For every fixed-point attractor, at least one CK is guaranteed to exist: At worst, we can succeed in control by pinning all nodes to their values in the desired attractor. Cyclic attractors do not necessarily have a CK. We know we cannot pin any nodes that do not have a constant value in the attractor, as this will not be consistent with the original attractor.  To determine the typical sizes of CKs in biological examples, we analyzed models in the Cell Collective database [30]. They span, but they are not limited to, biological cellular processes including cycles, differentiation, plasticity, migration, and apoptosis. They exemplify biological processes in humans and other animals, as well as plants, bacteria, and viruses. For 44 networks, using a reasonable amount of computing time, 3 our algorithm was able to find CKs for all attractors for which a CK exists. For an additional 5 networks for which we were unable to exhaustively find all attractors, we computed CKs for a set of attractors reached from 10 4 to 10 6 random initial conditions (see Methods). The networks we analyzed range in size from 5 to 73 nodes (see Table 1 in Methods). In computing CKs for all known attractors of each network, we depart from the approach adopted in [6], which studies only the CK for the attractor with the largest basin. We find empirically that the average CK size |CK| in each network is close to the base-2 logarithm of the total number of attractors r (Fig. 1). The reasons for this scaling are subtle, and in the next section we propose a theoretical interpretation and speculate about whether this scaling persists at larger n.
The observed trend is a potential explanation for why models of biological networks have small CKs. Biological CKs may remain small regardless of the size of the network because the necessary control depends mainly on the logarithm of the number of possible outcomes, which may be fundamentally limited in functional networks.
Comparing our results with those obtained using alternative methods In Fig. 2, we compare our control kernel size results to two other methods: stable motifs [7] and feedback vertex sets [8,9,5]. First, the method of stable motifs groups cyclic attractors with identical stationary parts into quasi-attractors (see Introduction and [7]), meaning that controlling for individual cycles is not necessarily possible in all cases. Still, we see in Fig. 2B very similar scaling results for stable motif control set sizes as a function of the number of quasi-attractors. 4 Second, we expect feedback vertex sets to provide an upper bound on the size of the single set of control nodes needed to reach all attractors [5]. A set achieving an analogous goal, but still defined in terms of attractor-specific CKs, is a "union" CK, namely a set of nodes that includes a CK for each attractor. 5 Plotting the sizes of the union of CKs across attractors for each network, we see similar results to the sizes of feedback vertex sets ( Fig. 2C and D). Note, too, that logarithmic scaling of the feedback vertex set size is much less pronounced than in the case of attractor-specific CKs in Fig. 2A.
Additionally, we analyze the overlap between the specific nodes in our control kernels and the controlling nodes identified by the other two methods. Note that the existence of

Theoretical Interpretation
In this section we investigate the conceptual reasons for the scaling behavior in Fig. 1, connecting it to an unsolved problem in computational learning theory. What follows is a schematic guideline of the steps that lead us to the result: • We start by identifying three contributions to the CK, with different properties: input nodes, distinguishing nodes, and additional nodes. We then propose a method for bounding the CK size that naturally provides the controlling nodes in this order.
• To find CKs for a network, the global dynamics must generally be re-solved for multiple combinations of pinned nodes of increasing size. We show that the additional nodes are the only nodes whose identification requires this repeated solving. Identifying input nodes is trivial, and distinguishing nodes require knowing only the network's original attractors and solving a problem equivalent to finding minimal order witness sets.
• We then show that the additional nodes provide a subdominant contribution to the size of the CK for the networks we study. By neglecting the additional nodes, we exhibit a "first order approximation" of the CK that we compare to the full calculation in Fig. 1. We show that the approximation scales logarithmically with the number of attractors r, thus explaining our main result.
• We also discuss cases that may significantly deviate from this scaling behavior. We identify two possible scenarios: a) Cases where the contribution of the additional nodes is dominant; b) Cases where the average size of the minimal witness sets exceeds log 2 r.

Input Nodes
Let us start by considering a special class of nodes that are easy to analyze separately: those that have a constant value in time. In biological models, there are often examples of nodes whose dynamics is described by the identity function x p (t + 1) = x p (t), as it offers a simple implementation of external inputs acting on the network. Input nodes appear often in biological networks as either signals from other subsystems or as external variables controlled in experiments (e.g. [31]). With this in mind, we will assume the following definition: Definition 2 (Input node): A node is an input node if its updating rule is the identity function.
Having input nodes increases the number of attractors in a network, as each setting of the inputs corresponds to at least one distinct attractor.
In steering the dynamics towards a given attractor, we are forced to pin the input nodes to the values they take in the attractor. Otherwise, both values for each input would remain viable, and we cannot select between them by controlling the other nodes. As this step is unavoidable we can easily conclude the following: Proposition 1: A control kernel must include all the input nodes of the network. Therefore, we pin all input nodes as the initial step of our analysis. 6 6 When present, input nodes act as control nodes as defined in [18], and in agreement with the findings of that study. Note that it is possible for nodes that were not input nodes in the initial network to become effective input nodes after this pinning procedure. A simple example is a node whose state depends only on input nodes. We do not distinguish this situation in our study. Therefore, these nodes are counted among the distinguishing nodes as defined in the next subsection.

Distinguishing Nodes
Let us now examine the effect of the pinning procedure on the state transitions. It is easier to analyze the changes in terms of the transition matrix, the table of correspondences between the state at time t, x = (x 1 , . . . , x n ) (the left side of the table) and the subsequent state at time t + 1, x = f (x) = (x 1 , . . . , x n ) (the right side): state at time t state at time at time t + 1 Pinning node j to 0 (or 1) corresponds to deleting column x j and x j and deleting the 2 n−1 rows with the opposite value x j = 1 (or 0). Note that a fixed point attractor corresponds to a row in the transition matrix with the left side equal to the right side (i.e. x k = x k for all k in that row). Pinning a node will remove a preexisting attractor if, and only if, it appears on the left side of a row we are eliminating. When pinning to control the network into a desired attractor, we must at least eliminate all other initial attractors.

Definition 3 (Distinguishing nodes):
A subset of nodes for which a pinning exists that is both compatible with attractor A and incompatible with the other initial attractors of the network is a set of distinguishing nodes of A.
In order to remove all attractors other than attractor A, we need to pin a set of its distinguishing nodes, which leads us to the following: Proposition 2: A control kernel of attractor A must include a set of distinguishing nodes for attractor A with respect to the network's original attractors.
Therefore, a more efficient and still exact method for finding CKs is to check not every possible subset of nodes, but to check only distinguishing node sets (since we know that the CK must include at least one of them). This is the method we use to obtain our results in Fig. 1.
We now have a lower bound on the size of the CK. To state this precisely, let us consider a network with n nodes, r attractors, and m input nodes. Let us then select an attractor A and pin all input nodes to values compatible with A. The new network (with initial input nodes pinned) will possess a certain number r of attractors: A plus other attractors B 1 , . . . , B r −1 . Let us now call w (1) any minimal order set of nodes distinguishing A from B 1 , . . . , B r −1 . Then the size of the CK is bounded below by m + |w (1) | (propositions 1 and 2).

Additional control nodes
The reason that input and distinguishing nodes provide only a lower bound to CK size is that the pinning intervention can also create new attractors. For instance, we get a new fixed-point attractor whenever the left and right sides of a row in the transition matrix are equal over the n − 1 remaining columns, with differing values only in the column that we are eliminating. Note that pinning an input node cannot create new attractors. 7 Pinning the nodes in w (1) will remove B 1 , . . . , B r −1 , but this may now create new attractors. We can also get an upper bound by iteratively pinning subsequent minimal distinguishing node sets that are needed to remove these additional attractors: where w (i) with i ≥ 2 are the sets of additional nodes we need to pin after we have already removed B 1 , . . . , B r −1 . This iterative procedure produces only a bound instead of the exact CK size because iterative pinning can in some cases mislead us toward a pinning that works but is not minimal. To mitigate this aspect, in performing the iterative procedure, we choose among different minimal sets of distinguishing nodes by picking the one that creates the fewest new attractors. We find numerically that the bound computed in this way is often tight (see Fig. 12 in Methods). Also notice that finding the distinguishing nodes w (1) only requires knowledge of the initial attractors. On the contrary, to find the additional control nodes we need to distinguish the selected attractors from new attractors created during each round of pinning. This requires solving the global dynamics of the network for each combination of pinning nodes.

First order approximation of the control kernel
In the following, we will refer to the pinning of the input nodes as round zero in this iterative procedure. Pinning w (1) is what we will call the first round of pinning, while any additional control nodes will be pinned during additional rounds of pinning. We summarize this procedure in Fig. 3, where we show the process determining the CK of the attractor with smallest basin size in a gene regulatory network underlying early cardiac development [32].
Despite the iterative nature of our pinning procedure, these three steps possess very different characteristics. Round zero is trivial, and it does not introduce new attractors. The first and the subsequent rounds of pinning represent NP-hard problems, and they can introduce new attractors.
Remarkably, in our data set the first round typically identifies most of the necessary control nodes (Fig. 4). For this reason, we will call the input and distinguishing nodes a first order approximation of the CK, neglecting any additional control nodes. To understand why we expect this approximation to be reasonable, let us examine the likelihood of introducing new attractors through pinning. Nkx2.5

Pinning of the input nodes (Round 0):
We want the attractror with: • all other genes inactive exogen_BMP2_I and exogen_canWnt_II are input nodes. We pin them to the values they take in the selected attractor (active and inactive, respectively).

Removal of the unwanted attrators:
We pin GATAs, and force it to be inactive.

Removal of the unwanted attrators:
We pin Tbx5, and force it to be inactive.
Attractors after pinning: 1 (The iterative process ends) Figure 3: Schematic of our iterative pinning procedure for one example attractor in the network describing the core gene regulatory network for early cardiac development [32].
(Note that we use the iterative procedure only to put bounds on the control kernel size as in Eq. 2, while the true minimal control kernel can be computed by an exhaustive procedure as explained in the Methods. In this particular case, the two methods produce identical results.)  Figure 4: After accounting for input nodes, minimal distinguishing node sets w (1) account for much of the size of control kernels in the 49 biological networks. A: Some networks are fully controlled after pinning only input nodes, but most require further pinning. B: Vertical bars indicate the mean proportion of non-input control nodes that correspond to minimal distinguishing nodes (blue) and those that must be set by additional pinning (orange). Network IDs correspond to those listed in Table 1. Some cases remain "undetermined" as identifying the minimal size of distinguishing node sets can be computationally infeasible (see Methods). C: In most networks, the mean number of additional nodes needed beyond the minimal distinguishing nodes remains small.

Fixed-point attractors:
With certain assumptions about network properties, it can be possible to estimate the expected number of fixed-point attractors. This is the case, for example, in a well studied class of dynamical systems known as Random Boolean Networks (RBNs) [23]. RBNs are defined in terms of a generative mechanism for both their topology and dynamical rules, and this can be used to prove that, on average, a given RBN possesses just one fixed-point attractor [33]. The networks in our database are not random networks. They were instead inferred from data by a number of research groups. Interestingly, after the input and distinguishing nodes have been pinned the number of additional attractors is also typically small ( Figure 5). A heuristic argument that we detail below reveals general conditions under which we expect this to be true.
The argument is very simple: Whatever states appear on the right side of the transition matrix, we have an even distribution of 0s and 1s on the left side. Without knowledge of the specific mapping of initial states to final states, our best assumption is that they appear in random order on the right side, leaving unspecified which initial state corresponds to each next state. For each row, the probability of having the right side matching the left side entry by entry is 1/2 n , where n is the current number of nodes we have not pinned. We have 2 n rows. Therefore, the expected number of fixed point attractors would be 1. Given the heuristic nature of this argument, we cannot assume this to be more accurate than just an order of magnitude estimate. One reason why this number could be significantly greater than 1 would be the presence of many input nodes (which would guarantee the existence of at least 2 m attractors), but we have already removed all the initial input nodes in round zero.
In this way, we have reduced the study of the controllability of each of our networks to the study of a set of networks without input nodes. Notice that RBNs do not have input nodes, unless that happens by accident. Pinning the input nodes makes our networks acquire one of the main topological features of RBNs.
Cyclic attractors: We have neglected cycles in our previous argument. A cycle over states i and j would require the simultaneous conditions a) left side of row i = right side of row j and b) left side of row j = right side of row i. Proportionally longer chains of conditions would be required for longer cycles. In the case of RBNs, the intuitive idea that longer cycles become less likely due to the multiple conditions they require holds only for relatively small networks, e.g. networks with fewer than about 20 nodes (cfr. Fig. 1 of [33]). This intuition inevitably breaks down as the network size increases, due to the combinatorial growth in the number of ways a cycle of states can be closed. Eventually, cycles becomes dominant for RBNs with more than about 300 nodes.
A proliferation of cycles created during the intermediate rounds of pinning could potentially prevent our iterative procedure from converging quickly. Nonetheless, this is not something we observe in the biological networks we analyzed. Fig. 5 shows that the number of attractors remaining after each iterative pinning, including cycles, is typically small. Interestingly, our networks fall within the size range for which RBNs would present a number of cycles comparable to the number of fixed points. Whether biological networks much larger than the sizes we analyzed display a proliferation of cycles similar to RBNs is something we cannot currently test. Indeed, it is an open question whether networks displaying large numbers of cycles are relevant to biology [34] and complexity science more generally. It is also worth noting that many cycles in large networks become unstable when the network state is updated in a nondeterministic asynchronous way [35] (see Supplemental Material).
Our brief discussions about fixed-point and cyclic attractors reveal that -once the input nodes are pinned, and in the absence of additional structure in the dynamics that significantly undermine the approximate randomness of the transition matrix (for an example of this see the section on Random Networks)-we can expect our iterative pinning procedure to converge quickly. With a small number of attractors remaining after each round of pinning, there is a fairly high probability of having, at some step, only one attractor, at which point a CK has been found and iteration stops. This probability, which we empirically observe to be 58% across the networks we study (Fig. 5), confirms our simple interpretation.
We have anticipated that our first order approximation of the |CK| consists of neglecting the additional nodes (the lower bound in Eq. 2). A fast convergence of the pinning procedure guarantees a small number of rounds of pinning (the number of nonzero terms |w (i) | with i ≥ 2 in Eq. 2), but it says nothing about the size of those terms, and therefore does not guarantee that additional nodes cannot dominate |CK|. Therefore, to rule out this possibility, we need to better understand the expected size of minimal distinguishing node sets.
Witness Sets: What we call a set of distinguishing nodes of an attractor is closely When all attractors are fixed points, we define A as the set of the r attractors A i . The witness set of minimal size, w i , for a given attractor is then equivalent to the minimal set of distinguishing nodes w (1) defined above.
Cyclic attractors add two complications to this correspondence between witness sets and distinguishing nodes, but the basic idea remains the same. First, when distinguishing an attractor A i from a cycle, we treat nodes whose values change within the cycle as always being distinguished from a fixed value in A i . 9 Second, when distinguishing a cycle from other attractors, we first restrict the "columns" of A to include only those nodes that have constant values in the cycle, as we assume that distinguishing nodes must be pinned to static values. 10 The key question then becomes: What sets the typical size of minimal witness sets w i ? Simple examples show that |w i | can easily be as large as n for an individual i -for instance, if n other A j are chosen that differ from A i by one bit flip for each of the n nodes. However, creating an example with large |w i | for all i requires larger r. The more relevant question then becomes: what sets the average witness set size over attractors |w| as a function of the number of attractors r?
Naively, one might expect that |w| scales as log 2 r: On average, revealing a given bit will distinguish the desired attractor from about half of the remaining attractors, so one would expect to be able to distinguish any given attractor using about log 2 r bits.
Known topological features exist that naturally produce a logarithmic scaling of the number of distinguishing nodes. For example, we have already seen that a large number of input nodes in the network would induce logarithmic scaling. This also holds when the network possesses disconnected components, though this is not the most relevant case in biology. 11 To a lesser extent a hierarchical structure of interacting modules would favor the same scaling. Input nodes and modularity are important features of our database, but further analysis indicates they are not solely responsible for the log 2 r scaling (see Methods).
Even in the absence of such topological features, numerical experiments show that log 2 r is often an apparent upper bound on |w| , and this was long conjectured to be an exact result [36]. Yet definite exceptions to this apparent bound are now known to exist, and one such exception will be detailed in one of the next sections (see Finite Projective Plane). We first discuss a numeric experiment demonstrating that logarithmic scaling is indeed a typical scenario.
We start by randomly selecting r Boolean vectors and assuming that they constitute the set of attractors A. This way we leave the topology of the hypothetical network generating them unspecified. The result is shown in Fig. 6, where average |w| values are plotted for increasing values of n, each point averaged over realizations of A. The maximum values found for |w| always remain below the log 2 r reference line. The average |w| points fall along seemingly convex curves -anchored to the log 2 r line at r = 2 and r = 2 n , as expectedand depart from the log 2 r line by just a few units. The NP-hard nature of finding w limits our ability to simultaneously increase n and r in this analysis. 12 Given this numerical evidence, we have reason to expect that witness set sizes, and therefore distinguishing node sets, are typically no larger than the logarithm of the number of attractors they distinguish. In each round of pinning after the first, the expectation that we have only a few remaining attractors then translates into small numbers of additional nodes added in each round (small terms |w (i) | in Eq. 2 when i ≥ 2). This, combined with the expectation that only a few rounds contribute, leads us to consider an approximation in which we neglect the additional nodes.  First Order Approximation: Neglecting the additional nodes provides a "first order" approximation of the CK: We find empirically that the minimal number of distinguishing nodes is always less than the logarithm of the number of attractors being distinguished (Fig. 7). That is, for the cases we test, |w where r j is the number of attractors sharing the j-th input configuration (j = 1, . . . , 2 m ). When (4) holds, we can additionally bound the average minimal distinguishing node size |w (1) | over all r = r j attractors of a given network (see Methods): |w (1) | ≤ log 2 r. Combined with the fact that m ≤ log 2 r, we obtain the following bound on the average first-order approximation: We validate this for our cases in Fig. 8, finding in fact that |CK (1) | is bounded more strongly by log 2 r (see Methods for further discussion). The bound (5) relies on the empirically validated inequality (4), the more general validity of which we explore in the next section.
We can now summarize to draw our main conclusion. Our iterative pinning procedure adds only a small number of additional nodes after the first round, with the combination of input nodes and minimal distinguishing nodes w (1) representing the most sizable contribution to the CK (Fig. 4). Within the set of biological networks we analyze, this first-order approximation (Eq. (3), displayed in Fig. 8) accounts for the logarithmic scaling that we observe in Fig. 1. Furthermore, an apparent bound (5) on this approximation suggests that the logarithmic scaling of control may occur more generally.

Exceptions
The logic leading to the logarithmic scaling of |CK| could be invalid in two general cases: • Minimal distinguishing node sets may not be bound by the logarithm of the number of attractors they distinguish (inequality (4) may not hold); • Additional nodes may make a non-negligible contribution to the CK (approximate equality (3) may not hold).
Though we do not find either of these exceptions among the biological cases we analyzed, we explore a larger set of cases in these final sections to better understand the degree to which exceptions might be expected to arise in other real-world situations.
First, we describe a known construction that produces an exception to inequality (4). We then run our CK analysis on three ensembles of random networks to search for exceptions to logarithmic scaling.

Finite Projective Plane
If universal, the trend shown in Fig. 6 would explain the logarithmic scaling of |CK (1) | . But at least one exception exists. We owe its ingenious proof to Kushilevitz et al. [36], and it involves the geometry of a finite projective plane.
A finite projective plane [37] of order p (p being a prime number) is a set of q = p 2 + p + 1 points and q lines with the following properties: Kushilevitz et al. [36] used this construction to exhibit a set of vectors with |w| exceeding log 2 r for large enough r. These vectors are chosen to be the rows of the incidence matrix of such a plane (line vectors, or just lines in what follows) plus the rows of the identity matrix of order q (point vectors, or just points). Therefore, r = 2q in this example.
The proof proceeds as follows: To distinguish a point from the other points we just need to pin its only non-zero coordinate to 1. To distinguish it from the lines we first notice that this point is on p + 1 lines. Each line contains p more points. Therefore, for each line, we choose a point in it (different from the one we want to select), and pin its coordinate to 0. This way we pin p+1 more coordinates. Therefore, to distinguish a point from the remaining vectors, we need to pin p + 2 coordinates.
Distinguishing a line is easier. We just pin the coordinates of 2 of its points to 1. In fact, no point has 2 non-zero coordinates, and no other line contains both those points.
We can now easily calculate |w| for this set of vectors: where p = ( √ 2r − 3 − 1)/2. As p ∼ √ r, this construction gives rise to a polynomial scaling of witness set sizes with respect to r, and consequently to values of |w| far exceeding log 2 r when r is sufficiently large. The first prime number for which this happens is p = 17, corresponding to r = 614 attractors and a network with n = 307 nodes 13 .
As relevant as the existence of this example is the likelihood of analogous cases. One simple test to perform is to check whether flipping a single entry in one of the vectors is enough to bring |w| below log 2 r. We have explicitly checked this, but in the worst case a bit-flip perturbation instead adds to |w| (see Supplementary Material for a more thorough analysis).
It is important to stress that -when interpreted as the attractors of a network-these projective plane exceptions correspond to extremely biased scenarios, where each variable has a total ratio of 1s to 0s over the total number of attractor states equal to (p + 2)/2q ∼ 1/ √ r, i.e. already as low as 3% when r = 614. They constitute an infinitesimal part of the configuration space of all possible sets of r vectors. The likelihood of sets with |w| > log 2 r within this space (see [36] for some discussion of this problem), as well as the relevance in natural sciences of networks generating them as steady-states, remain open questions.

Random networks
To briefly explore to extent to which we expect exceptions to scaling in more general contexts, we apply our methods to three ensembles of randomly generated networks. First, we use the well-studied ensemble of p-K random networks [41]. In this ensemble, each node receives input from exactly K = 2 nodes, and Boolean logic functions are chosen such that the probability of the ON or 1 state appearing on the right-hand side of a given row of the truth table is p, a parameter that we vary. Second, we use another ensemble common in the literature that assigns each node a threshold to activation and a set of input nodes whose states are summed and compared to the threshold, with some input nodes acting to excite and some to inhibit [42,6]. Following Ref. [6], we set thresholds to zero and vary both the density of interactions as well as the relative proportion of inhibitory edges. In cases with little inhibition, networks are biased toward activation. To explore the effects of this bias, we also include an ensemble of "balanced" threshold networks, in which the thresholds of    individual nodes are set not to zero but to the average of incoming edges, such that the distribution of summed inputs is always centered at the threshold. The results of the control kernel analysis for networks sampled from these random ensembles are shown in Figures 9 and 10. Similar logarithmic scaling as in the biological networks is demonstrated by most of the random networks, with a few exceptions of large control kernels in the case of zero threshold networks (azure triangles in Fig. 9A). These exceptions can be understood by considering the effect of a strong bias toward activation on the set of fixed point attractors: Those fixed points with few or no activated nodes are expected to have small basins because most perturbations will lead toward states with more excited nodes. In particular, the state with all inactivated nodes, which is a fixed point for all zero threshold networks, often has a basin size of one in strongly biased networks. We might call such a fixed point an unstable "repellor" in the sense that all perturbations lead away from the state. Such a fixed point typically has a control kernel consisting of all or most nodes of the network, even when there are few overall attractors. All circled points in Fig. 9A have such a repellor fixed point (see "Outlier Random Networks" in Supplemental Material). These repellor cases are a concrete example of how biases in the dynamics can lead to additional nodes making a non-negligible contribution to the CK. When the bias toward activation is removed in the "balanced" networks, these exceptions disappear and control kernel sizes are again characterized by logarithmic scaling with respect to the number of attractors (dark purple circles in Fig. 9A).
We see in Fig. 9B and C that patterns in the size of the first-order approximation to the CK are similar to the biological networks: the number of input plus distinguishing nodes is typically near or below the base-2 logarithm of the number of attractors (and always below 2 log 2 r; Eq. (5)), and the number of distinguishing nodes is always below the logarithm of the number of attractors they distinguish (Eq. (4)).

Discussion
We analyzed control properties of a sizable fraction of the largest database of biologicallyinspired, Boolean networks currently available [30]. Following [6], we quantified the amount of external control necessary to drive a network to one of its original steady states in terms of the number of nodes whose states must be pinned. We found that the average amount of control necessary scales as the logarithm of the number of steady states of the unconstrained dynamics. At least within the range of network sizes we explored, this scaling is largely unaffected by the network size.
We show that the scaling is explained by the first two contributors to the control kernel (CK): the number of input nodes plus the number of nodes whose value must be specified to distinguish the desired state from the other possible steady states. On average, these two contributions account for most of the CK in the biological cases we analyze (Fig. 4). We explored the limits of this result and exhibited two scenarios that can invalidate it. First, the projective plane provides a highly contrived example that violates the logarithmic scaling of distinguishing nodes. Second, there are examples for which convergence of the iterative A B C Figure 9: Control kernels in random networks show similar characteristics to those seen in the biological cases. Here we choose networks of size 10, 15, and 20 from three ensembles: Erdös-Rényi networks with zero thresholds (azure triangles), Rényi-Rényi networks with balanced threshold inputs (dark purple circles), and p-K networks with random truth tables (pink diamonds). A: Mean control kernel sizes are typically near the base-2 logarithm of the number of attractors. Compare to Fig. 1. Outliers, highlighted with a black circle, share a bias toward excitation that creates repellor fixed points. B: The first-order approximation to control kernel size is typically below log 2 r and always below 2 log 2 r. Compare to Fig. 8. C: After fixing input nodes in round zero, distinguishing node sizes are empirically bound by log 2 r. Compare to Fig. 7. procedure is not fast, including biased dynamics that create fixed points with extremely small basins. We leave as open questions whether the absence of these exceptions within the biological models might be due to their biological nature, to their tendency for having near-critical sensitivity [43], or to the fact that these networks are designed to be easily interpretable.

Random truth table networks
Control kernels are fundamentally related to closed circuits in the regulatory networks, as suggested by the close relation to the methods of stable motifs and feedback vertex sets that rely on identifying these closed circuits. In particular, we expect that each non-input control kernel node is part of a directed closed circuit in the regulatory networks (as paths in the network not involved in cycles correspond to deterministic cascades that cannot support multiple possible states).
We expect our results to have important biological implications. First note that, differently from [6], we did not restrict our analysis to selecting one particular steady state (the "main attractor" with largest basin). On the contrary, we considered all possible fates of the dynamics. In this expanded analysis, we showed that the size of CKs corresponding to different fates exhibit a remarkably small variance (error bars in Fig. 1). A small variance in CK size would suggest that converting among alternative cell types may be easy for any cell type, whereas large variance could be caused by, for example, adult cell types that are relatively easy to control for (small CK) but are prohibitively difficult to reprogram back to their progenitors (large CK).
Our result may also be relevant to the interpretation of the large amount of data attained through the recent advancements in single-cell RNA-seq techniques [44]. Unsupervised clustering based on transcriptome profiles has already helped identifying otherwise elusive cell types (e.g. [45,46]). At the same time, cell types obtained through unsupervised clustering are sometimes difficult to relate to classification schemes based on morphology and function [47]. We have seen that the number of cell types (attractors) can be greatly affected by the presence of extracellular control, suggesting the possibility that single-cell sequencing experiments might also be exhibiting what the cell types would be in isolation, without external control factors. On the contrary, morphology and function might be identifying the smaller number of cell types determined by cell-to-cell signalling within a tissue.
The observed logarithmic scaling implies that biological networks may in fact be as easily controlled on average as would be expected if each pinning simply removed half of the remaining possible behaviors. A regulatory network with tens of thousands of genes could easily admit ∼ 100 attractors and appear extremely difficult to control. But logarithmic scaling of control implies that forcing the expression of few accurately selected genes might be enough to reduce hundreds of cell types to just one type with recognizable morphology or function.
The two cases that break logarithmic scaling highlight possible strategies for designing systems that are more difficult to control. First, the projective plane example implies average CK sizes that grow as the square root of r, though this would require much larger networks with many more attractors than are typically analyzed, and it is likely to require highly constructed, non-random structure. Significant differences show up only in particularly large examples: for instance, selecting one out of a million states could require controlling roughly 1000 nodes in a projective plane example instead of 20 assuming logarithmic scaling. On the other hand, biasing dynamics to create very small basin sizes could be a more natural way to force control kernels to be large, as in the biased threshold networks. Biological selection for systems that are difficult to control is an intriguing possibility.
More generally, in cases in which our assumptions hold, the rule of thumb that pinning a binary variable reduces the number of outcomes by a factor of two becomes more than just a vague intuition. In these cases, we are justified in focusing on distinguishing nodes as a first approximation of necessary control nodes. This greatly simplifies the search because distinguishing nodes depend only on the alternative long-term dynamics of the system, and not specifics about how the system gets there.

Upper bound on the first order approximation of CK size
Given a network with r attractors and m input nodes, we can split the attractors into 2 m disjoint sets, one for each configuration of the inputs. If r j is the number of attractors sharing the j-th input configuration, 2 m j=1 r j = r. Therefore, We know the inequality 1 r j r j k=1 |w (1) k | ≤ log 2 r j is not exact, as the projective plane construction represents a known exception. Nonetheless, it is exact within the networks we analyze (see Fig. 7). We want to show that assuming its validity leads to an upper bound on |w (1) | .
Let us call d = 2 m and x = (r 1 , . . . , r d )/r. The first term in (7), f (x) = x · log 2 x is a convex function, and we can find its extreme values as usual. We first extend it to real values, then find its minimum value outside the probability simplex ∆ d−1 , i.e. the domain with x · 1 = 1 and x j ≥ 0, ∀j. As ∇f (x) = (ln x + 1)/ ln 2, f has a minimum point atx = 1/e. The line identified byx is already orthogonal to the simplex. Therefore, its projection onto ∆ d−1 is simply 1/d. By virtue of the convexity of f , its actual minimum will be reached by the integers r j that best match an even distribution of the attractors among the 2 m sets. (This is expected, as f is just a negative Shannon entropy.) When this happens,  Input entropy Figure 11: The "input entropy" µ is typically close to the number of input nodes m in the 49 biological networks we test.
The maximum of f is reached near to each vertex of the simplex, when one r j is at its maximum (i.e. 1/r), while all others are at their minimum (i.e. 1 − (2 m − 1)/r). This gives the following upper bound on |w (1) | : For m = 0, we regain our hypothesis applied to the entire set of attractors, |w (1) | ≤ log 2 r. For m = log 2 r, |w (1) | = 0.
Further, by noting that both m and |w (1) | are bounded by log 2 r when inequality (4) holds, we can place a simple, conservative upper bound on the "first order approximation" of |CK| that depends only on r: We find empirically that the actual |CK (1) | values lie significantly below this conservative bound. For example, in the biological cases we test, |CK (1) | never exceeds log 2 r (see Fig. 8). The reason for this can be understood by analyzing eq. (7) more closely. Eq. (7) originates from the assumption that 1 r j r j k=1 |w (1) k | ≤ log 2 r j , ∀j. Smaller values of |w (1) k | for individual attractors can cause |w (1) | to lie below the maximum given by eq. (7). But this maximum itself varies depending on the distribution of the number of attractors across input states j, captured by the first term in eq. (7), f (x). We have seen that this term is a negative entropy, and that it approaches −m when the r attractors are (approximately) evenly distributed among the 2 m input configurations, that is to say when r j r/2 m . By defining the input entropy µ = −f (x), we can express the upper bound on |CK (1) | as For evenly distributed attractors, µ m, and |CK (1) | log 2 r. At the opposite extreme, if attractors are maximally unevenly distributed, µ can approach zero.
In the biological networks we test, we empirically find that m − µ is always too small to force |CK (1) | above log 2 r (that is, m − µ < log 2 r − j 1 r j r j k=1 |w (1) k |). Fig. 11 shows that µ is always close to m in these cases.
With the larger statistics provided by the ensembles of random networks, we find a few cases in which |CK (1) | is slightly greater than log 2 r (four cases in the zero threshold ensemble and four more cases in the balanced threshold ensemble; see Fig. 7B), but still never greater than 2 log 2 r, as expected.

Modular approach to finding attractors
Biological regulatory network models are in many cases highly modular (having subsets of nodes that interact much more with one another than with other subsets) and hierarchical (having upstream modules that do not depend on the behavior of downstream modules). As recognized before in the literature [48], the attractors of a Boolean network can be computed much more efficiently in modular hierarchical networks by analyzing modules separately. The basic idea is to find all attractors of upstream modules first, then use each upstream attractor to find corresponding attractors of downstream modules. As we detail in the following paragraphs, some subtleties make this process complicated, but it is still straightforward to accomplish computationally.
We begin with a decomposition of a given Boolean dynamical system into hierarchical modules, which we define here as strongly connected components of the system's causal network (with individual Boolean variables represented as nodes and dependencies between variables represented as directed edges). This produces a directed acyclic graph describing the dependencies among modules. First, for upstream modules that do not depend on any other modules, we find attractors using a brute-force approach that iterates over all possible states of the module. Then, for each downstream module D, we iterate over all possible combinations of attractors U of upstream modules on which D depends. For a given attractor U ∈ U, the states of nodes in the upstream modules are fixed to the values (perhaps dynamically varying, in the case of a cyclic attractor) that they have in U . Corresponding attractors for D are then found through a brute-force approach iterating over all possible dynamics of D given the input provided by U .
Two subtleties arise when computing the set of all possible attractors U for the nodes upstream of D. In the easiest case, D depends on a number of separate modules U 1 , U 2 , ..., U n that do not share ancestor modules further upstream, and all corresponding sets of upstream attractors U 1 , U 2 , ..., U n consist only of fixed points. In this case, U consists of i |U i | attractors, all combinations in which one attractor is chosen from each set of upstream attractors U i . The first subtlety arises when upstream modules share ancestor modules further upstream (for instance, if U 1 and U 2 both depend on U α ). In this case, some combinations of attractors will be inconsistent in that they do not have the same values on overlapping nodes in ancestor modules; these inconsistent combinations should be simply removed to not appear in U. The second subtlety occurs with cyclic attractors, those with length > 1. When combining two cyclic attractors U 1 and U 2 of upstream modules, with corresponding lengths 1 and 2 , we must account for all possible phase shifts between the two attractors. This leads to p distinct possible attractors, each of length c , where p is the greatest common divisor of 1 and 2 , and c is their least common multiple.

Computing control kernels
Given our definition of CKs, a brute-force method for computing them is straightforward. For each attractor, we loop over sets of distinguishing nodes of increasing size. Pinning the distinguishing nodes to their values in the attractor, a CK is found when no other attractors exist in the pinned dynamics. Cyclic attractors that are not controllable are easily identified by pinning all constant nodes: if more than one attractor remains in this case, the cycle does not have a CK.
The modular approach to analyzing Boolean dynamical systems can also make the computation of CKs much more efficient. Note that, within a given module, CK nodes can be determined without analyzing downstream modules because any control exerted downstream of the module will not affect it. For this reason, in the modular approach we compute CK nodes at the same time we compute attractors for each module. That is, given each attractor state U of upstream modules and for each attractor of the current module D, we compute CK nodes within D by looping over distinguishing node sets of increasing size (restricted to nodes within D) and pinning them until the dynamics produce a single attractor.
In some cases it is possible to compute control kernels using the modular approach but computationally infeasible to compute minimal distinguishing node sets (because this requires checking every possible set smaller than the minimal size). In these cases we compute an upper bound on the size of the minimal distinguishing node sets (shown as triangles in Figures 7 and 8), equal to at worst the size of the control kernel, and in some cases set by checking a limited subset of smaller distinguishing node sets that have been identified for other attractors in that network.

Sampling analysis
Some networks in the database contain modules that are too large to be analyzed exactly using the above approach. In particular, when a module has more than about 30 nodes, our code is unable to run the analysis using a reasonable amount of time or memory. In these cases, we also attempt a sampling analysis in which we find attractors by initializing the system at N S = 10 4 to 10 6 random states. We then use the brute-force method for finding CKs described above, where a CK is defined as producing the single desired attractor when initializing the system using the same N S random states. We first restrict sampling analysis to networks for which we find fewer than 10 3 attractors.
This sampling analysis initially allowed us to find CK and distinguishing node data for seven additional networks. We then tested the dependence of our results on N S for these seven networks. For three of the networks, we confirmed the previous result when N S was increased by a factor of 10 to 10 5 . We achieved convergence for two of the remaining networks   Mean upper bound on control kernel size Figure 12: The upper bound on control kernel size in Eq. 2 is often tight. Here control kernels and the iterative bound on control kernels were computed for 40 networks using the sampling method.
using a further factor of 10, finding the same attractors using N S = 10 6 . The remaining two networks continued to show non-negligible increase in the number of attractors, particularly for cyclic attractors with small basins. Most of these cycles were not controllable using a static intervention. We removed these two networks from our analysis, which did not significantly alter our results. The five networks analyzed using only the sampling method are represented as blackbordered circles in Figure 1 and unfilled circles in Figure 7.
In Table 1, we list the names of the 49 networks we analyze, those for which we are able to find CKs for all attractors. Those names marked with an asterisk were analyzed using the sampling approach.

Iterative bound on control kernel size
We used the sampling method to run the iterative algorithm that computes the upper bound in Eq. 2 to make Figure 5. This produces iterative bound data for 40 networks. When compared to the true average CK size, the iterative upper bound is often tight (Figure 12).

Random network ensembles
We use two types of random network ensembles, one in which node logic is defined in terms of truth tables and one in which logic is determined by thresholds to activation.
To create random truth tables, we construct p-K networks in which each node depends on K = 2 other randomly chosen nodes. Each node's truth table is constructed such that each of the 2 K = 4 possible states of its input nodes has a probability p of activating the node. We sampled 225 networks from this ensemble, with 75 each having p = 0.25, 0.5, and 0.75, and 25 each within these sets having number of nodes n = 10, 15, and 20. We were successful in finding control kernels for all controllable attractors for all of these sampled networks.
To create random threshold networks, we follow Ref. [6]. The network's dependency structure A is first chosen as an Erdös-Rényi graph with average degree d, and each edge in this graph is assigned with probability p I a value of −1 (representing an inhibitory interaction), or otherwise +1 (representing an excitatory interaction). Each node's state is determined by comparing the sum of the incoming signed inputs s i = j A ij x j (t) to its threshold τ i : We consider two choices for the thresholds τ . In the first case, all thresholds are set to τ = 0; in the second "balanced" case, thresholds are set to half of the sum of incoming edges: τ i = 1 2 j A ij . We sampled 75 networks from this ensemble using each type of threshold, with each combination of d = {1, 2, 3, 4, 5}, p I = {0.1, 0.3, 0.5, 0.7, 0.9}, and n = {10, 15, 20}, for a total of 150 networks. Of these, for 146 we were successful in finding control kernels for all controllable attractors.

Data availability
The biological network models analyzed during the current study are available on the Cell Collective website, https://cellcollective.org/. Data displayed in figures are available in a CSV file accompanying the manuscript.

Code availability
The code to calculate control kernels is available in a ZIP file accompanying the manuscript.

Supplemental material
The effects of asynchronous updating In our study, we assume dynamics in which nodes are updated deterministically, and in a synchronous way. Although this allows for much more straightforward analysis, it is known that more realistic, asynchronous updating schemes create important differences in the types of dynamics displayed by Boolean networks [49,35,50,51]. In this section, we detail those differences and explore how they affect our control analysis. An asynchronous update rule reduces the total number of attractors from r in the synchronous case to a smaller r . Fixed-points are preserved, while the number of cycles is reduced. Before proceeding any further it is important to stress that our logarithmic scaling result stands even when excluding all networks that have any cycles, leaving 24 of the 49 networks that we specifically highlight in Fig. 14F. Furthermore, we do not have a reason to believe that an asynchronous updating scheme would fundamentally change the result for the networks we test. This is because any loss of cyclic attractors in the asynchronous case would cause a corresponding decrease in the size of witness sets and therefore CKs.
More specifically, cycles are not only reduced, but their definition is now made less precise. When node states are updated with a random period, the network no longer transitions through an ordered cycle of states, and basins are now a more useful concept. Distinct cycles correspond now to distinct basins, i.e. regions B k of the configuration space such that the transition probability between states X i in B i and X j in B j is equal to zero if i = j.
Let us now consider the dynamics within one of these basins. If all nodes keep changing for t → ∞, then this "non-constant attractor" is not controllable. In that case, our definition of control kernel becomes irrelevant as it only accounts for static control. Therefore, we can consider cases where at least some nodes within the basin converge to fixed values, while the remaining nodes have non-constant values. Analogously to Ref. [7], we will refer to this pseudo-Boolean vector where some entries remain unassigned as a 'quasi-attractor' of this basin. The quasi-attractor of a fixed point is clearly the fixed-point itself.
We can now proceed in a similar way to the synchronous case.
Distinguishing a fixed-point attractor: The witness sets of these vectors will each be the same size or smaller than their counterparts in the case where all r vectors have all their entries assigned. (This is analogous to distinguishing fixed-points in the synchronous case when considering the additional freedom granted by the cycling components.) The asynchronous CK for a fixed point will never be greater than the size of the synchronous CK.
Distinguishing a "non-constant" attractor: We can proceed in the same way we did with synchronous cyclic attractors. For each quasi-attractor, we restrict the analysis to the n columns where all values are constant. This corresponds to a minimal witness set problem with r attractors and n bits per vector. (Notice that some entries in the remaining vectors might still be non-constant, but this does not increase the size of the witness set.) We know that the reduced n does not have a direct effect on the size of the witness set. Though it is the case that strings of reduced length are more likely to coincide, if the reduced string of our attractor has a duplicate in the set, then the non-constant attractor cannot be controlled (as in the analysis of synchronous cycles). In this case, then, we expect a similar result for the size of witness sets and therefore control kernels: Given the new number r of quasi-attractors, we expect to be able to approximate the control kernel by solving a minimal witness set problem with r vectors, and we therefore expect |CK| ∼ log 2 r .

Relation between controllability and other network attributes
In the main text, we argue that the logarithmic scaling of CK sizes might be expected for networks that have many input nodes or otherwise consist of many small hierarchical modules. This motivates checking whether networks with few input nodes or large modules still obey the scaling. As shown in Fig. 14 C and D, even restricting the analysis to these more "difficult" networks preserves our main result.
Additionally, Fig. 14 A and B indicate the properties of networks in the Cell Collective database that we are unable to analyze due to the computational complexity of identifying all attractors and CKs. Finally, in Fig. 14 E and F, we highlight average control kernel sizes for those networks that do and do not have cyclic attractors (see above supplemental section for a discussion on the effects of asynchronous updating on cyclic attractors).  Figure 14: We may expect networks with many input nodes and small modules to have control kernels with size roughly log 2 r. But even restricting to those with few input nodes and large modules (red points in B and D), the analyzed networks closely follow this logarithmic scaling. A and C: the number of networks with given network statistics among all networks in the Cell Collective database (blue) and the 49 for which we were able to successfully find all control kernels (orange). E and F: Mean control kernel sizes highlighting networks that have cyclic attractors and those that do not.

"Smallest perturbations" of the projective plane construction
We have seen that sets of binary vectors with |w| > log 2 r are possible, and know examples can be built exploiting the geometric properties of "large" finite projective planes. We now want to start exploring how likely these sets of vectors are by considering the smallest perturbations of the projective plane counterexample, i.e. the effect of flipping just one entry in one of its vectors. There are four possible cases to consider: Only one line has two 1s in those same coordinates, let us call it L PQ . We can pin them to 1, and a coordinate corresponding to a different point than P and Q on L PQ to 0. This is enough to identify A, so w A = 3.
Let us check that this does not increase w B , for any other attractor B. L PQ can be identified equally well by a different pair of 1s. Therefore, its w is still 2. The other lines are not affected. Among the other points, only Q is affected by the flip. To identify Q, pinning its 1 value is not enough. We also need to pin coordinates to 0 to exclude the p + 1 lines on it. In particular, we need to exclude L PQ . In doing so, we can pin the characteristic coordinate of P to zero. This will also distinguish Q from A.
Therefore, a 0 → 1 flip in a point only affects the minimal number of distinguishing nodes of that attractor, and reduces it from p + 2 to 3. If we call |w P P | the average of the unperturbed case, eq. (6), and |w P 01 | the new average, |w P P | − |w P 01 | = p + 2 2q With p ≥ 16, this difference is less than 0.03. A single flip of this kind is not enough to bring this configuration below the log 2 r reference level.
1 → 0 in a point ("P10 case") This flip is replacing a point with the null vector, that we will call A. To distinguish A we need to set all coordinates, other than the one we just flipped, to 0. This also distinguishes it from all the lines. w A = q − 1. Besides, this flip does not affect any other vector, as we always need to set at least one coordinate do 1 to distinguish either a point or a line, and this automatically excludes A.
This flip is actually increasing |w| even further. For p = 17, this change is still relatively small, and equal to 0.47, to be compared to log 2 614 = 9.16 and |w P P | | r=614 = 10.5 (p = 13 gives rise to r = 366, which is very close to threshold.).

→ 1 in a line
Let's call P i ,. . . ,P p+1 the points on the line hosting the flip. We now flip the characteristic coordinate of point Q to 1 and call this new attractor A. We can always choose to set the characteristic coordinates of Q to 0 as part of our procedure do distinguish P k from L P k Q . This automatically excludes A. Therefore, we can reduce w k from p + 2 to p + 1 for all P k .
On the contrary, this does not affect w j for j corresponding to the other lines. Lines are identified by two points, and we can always exclude Q in choosing those points. We can do the same with A.
This is less than 0.002 with r = 614.

→ 0 in a line
Let us call P the point identified by the coordinate we flip, and A the new attractor we obtain. Before the flip, A was a line containing P. Now we need to distinguish P from one less line. No other attractor is affected.
With r = 614, this difference is a negligible 0.07. These considerations show that this projective plane counterexample is stable to the smallest changes to its initial configuration. But, it is important to notice that these perturbations do not really affect the strong bias of zeroes to ones. When these vectors are interpreted as attractor states, the relevance of sets with such a strong bias in modeling real world system remains dubious.
These examples are compared to log 2 r in Fig. 15. Notice that the increase in |w| due to a P10 perturbation, eq. (12), also reduces the minimal order p for which |w| > log 2 r from 17 to 13, and r from 614 to 366. Control kernels and network circuits Figure 17 shows the overlap among sets of controlling nodes using alternative methods, as discussed in the main text.

Validating attractors
Our results depend crucially on correctly identifying the number of attractors in each network. In cases of relatively small networks, it is possible to check the results of our modular approach with a more direct approach that simply finds all closed loops in the explicitly enumerated state transition graph. For all such cases, consisting of 22 biological and 375 random networks, we have checked that the number of attractors matches exactly using the two techniques. In addition, we have checked that the biological cases analyzed using stable motif analysis have the same number of attractors in the 15 cases of networks we were able to analyze that have only fixed point attractors (where the definitions of our attractors and stable motif's quasi-attractors match).