Reconciling qualitative, abstract, and scalable modeling of biological networks

Predicting biological systems’ behaviors requires taking into account many molecular and genetic elements for which limited information is available past a global knowledge of their pairwise interactions. Logical modeling, notably with Boolean Networks (BNs), is a well-established approach that enables reasoning on the qualitative dynamics of networks. Several dynamical interpretations of BNs have been proposed. The synchronous and (fully) asynchronous ones are the most prominent, where the value of either all or only one component can change at each step. Here we prove that, besides being costly to analyze, these usual interpretations can preclude the prediction of certain behaviors observed in quantitative systems. We introduce an execution paradigm, the Most Permissive Boolean Networks (MPBNs), which offers the formal guarantee not to miss any behavior achievable by a quantitative model following the same logic. Moreover, MPBNs significantly reduce the complexity of dynamical analysis, enabling to model genome-scale networks.

Similarly, in the same paragraph the authors write, "...there is no guarantee that their analysis can be relevant for a more precise model..." It seems to me that a successful BN aims to capture the broad behavior of the system being modeled. While it is certainly the case that a BN will not capture finer details, and that spurious predictions can occur, it seems to me that suggesting a strong BN is not relevant without qualification is perhaps too strong. Fig. 1d is identified in the caption as showing the "complete dynamics of f", but it seems to me this is not correct for two reaons. First, in a 3-node BN there are 2^3 = 8 states, whereas the authors here show 3. Second, The first transition, from state 000 to state 100, can only occur based on a changing input signal rather than from the internal dynamics dictated by the update rules. It seems more natural here to show the complete state transition network. If desired one could indicate (with a different type of edge, perhaps) transitions that can occur by forcing the input signal to change (i.e. switching the state of node 1 from 0 to 1 or vice versa) to distinguish it from the natural dynamics of the system. Moreover, I don't think the authors are making a particularly strong case against BNs here; to my eye the takeaway from this example is "this BN does not capture the experimental behavior of the system", which simply suggests a different BN should be chosen. (Indeed, this neatly describes a common situation that arises when BNs are developed.) To be compelling here the authors need to show that -no-BN can show transient activation (which is not possible, see below) or shift their argument to make clear that the "ON vs. OFF" nature of BNs inherently fails to capture the detail of a continuous model.

Comment 2
A three-node network that does capture transient activation is f1 = signal (fix to ON) f2 = x1 f3 = not (x1 and x2) In a synchronous update scheme, the following sequence of states can be observed: 100 -> 111 -> 110 In short, it seems to me that Fig. 1 and the main text pertaining to it (through the beginning of pg2) should be revised.
Comment 3 On pg2 col1 the authors write, "Therefore, the validity of a model cannot be assessed by the usual interpretations of BNs." The meaning of "usual interpretations" here (and in the following paragraph) should be clarified.

Comment 4
On pg3 col2 the authors write, "with an erroneous conclusion that its underlying influence graph is wrong" when discussing the failure of a BN to capture observed dynamic behavior (Fig. 2). This does not seem correct to me; a BN is not fully defined by its influence graph (Fig. 2a) but rather by its update rules (Fig. 2b). Similar to my comment 2, it seems to me here that one would first search for a different parameterization of the influence graph that succeeds in capturing the observed behavior.
Comment 5 On pg4 col1, the authors write, "A component i can change to the increasing (resp. decreasing)..." and then later "Once in increasing (resp. decreasing) state, it can reach 1 (resp. 0) at any time." This second quote makes it clear that the first quote is referring to changing from state 0 to increasing and changing from state 1 to decreasing. I suggest changing the first quote to make this explicit.
Indeed, it might be useful to indicate near this text what transitions are possible between the 4 states (0, inc, dec, 1), as the text does not address whether it is possible to go back from increasing to 0 and from decreasing back to 1. (A table might work well.) Moving to the "derivative of a continuous model" framework, one could argue that you start by going from 1 (a maximum) to decreasing and proceed by moving back to 1 without first passing through a minimum (0) and then a region with a positive slope (increasing). However, in the Boolean reduction one could consider such fluctuations occurring uniformly above or below the activation/deactivation threshold. Thus some interpretation here would be useful.

Comment 6
On pg5 col1 the authors introduce some notation (e.g. P^{NP}), which should be defined for readers who are unfamiliar with it.

Comment 7
In the SI it appears that several Latex references are broken (e.g. the beginning of section 3).

Response letter
We are grateful to the reviewers for their positive feedback and suggestions for strengthening the presentation of the results. Besides improvements on the phrasing of different statements, this revision brings the following main changes: • We clarified the introductory example settings, by better justifying the choice of 1) initial conditions, 2) observations coming from theoretical analysis of quantitative models, and 3) experimental data from synthetically-designed circuits; • We present a short analysis showing that no (a)synchronous BN exists that respect the I3-FFL motif and the corresponding observed behaviors, strengthening the points around network motif validation which can be hindered by the pure Boolean (a)synchronous interpretation; • We present a more detailed description of the analyses performed for the case studies to better emphasize the computational gain when computing reachable attractors, while MPBNs remain stringent enough to reproduce previous predictions.
Please see below our point-by-point response to each reviewer. A PDF tracking the changes made in the text is enclosed.

Reviewer #1
1. The manuscript focuses on reachability-related questions. Yet, there are other important questions that Boolean models aim to answer, for example determining the complete attractor repertoire of a system, and determining the basins of attraction of each attractor. Would these questions also be easier to answer with Most Permissive Boolean Networks, or would these still be done in the traditional way, and the MPBNs would only be used to determine reachability?
The complexity gain also applies to attractors and reachable attractors. Actually, a large part of the case studies on biological and randomly generated network involved the computation of reachable attractors and showed an improvement of computation time by several orders of magnitude. Determining whether a configuration belongs to the basin of a given attractor also boils down to the attractor reachability problem. We clarified these results in several places in the revised manuscript: in the complexity section (p5, lines 428-442), where we added a paragraph to address the reachability of attractors, and in the case studies section (p6, lines 465-474), where we made explicit the fact that case studies rely on the determination of attractors and their reachability.
2. Figure 1 gives an example where a traditional Boolean model (whether updated synchronously or updating one node at a time) would miss an observed non-monotonic behavior of a system component. Yet, the exact behavior is incompletely specified, and thus the example is not completely convincing. If one started the Boolean model from the initial condition x1=0, x2=1, x3=0, it would be possible for x3 to increase to 1 and then to decrease, thus qualitatively reproducing the trajectory in panel c. Why is only the initial state 000 considered in the Boolean model?
The initial state 000 is dictated by the theoretical and experimental studies of the I3-FFL network, showing in both cases that, starting from all node being inactive, an increase of the signal can lead (under some range of kinetics parameters) to transient activation of the output. We updated the text in the introduction to clarify this point (lines 48-53).
3. The presentation of the two applications is brief and not sufficiently informative. The Supporting Information has a single paragraph on biological applications. "Correctly predicted the loss of reachability of apoptotic attractors upon the mutations of p53 and NICD presented in the study (Fig.  S2)" -this is reproduction of a single finding from the reference; Fig. S2 reproduces the influence graph of the reference. "In the case of T-cell differentiation (9), MPBNs recovered the same reprogramming graph between T-cell types (Fig. S3)" Fig. S3  4. There should be an illustration of MPBNs used to determine the attractors reachable from a given initial condition.
Two of the case studies on models from the literature and the case study on very large random BNs actually relied on computing reachable attractors from specific initial conditions. We improved the description of these case studies to make it explicit (p6, lines 465-502) This statement comes from the fact that an (a)synchronous analysis can rule out a BN which actually encodes the correct logic of activations (as it is the case with the introductory example): if there exists no other BN compatible with the network motif and the expected dynamics, the analysis will conclude that the motif is not sufficient to reproduce the behavior. We reworded this sentence to clarify this idea (lines 68-72) Moreover, we added an analysis of the I3-FFL network motif demonstrating this issue: it turns out that the BN used in the introductory example is the only one matching the network motif and can reproduce the behavior observed with quantitative models and synthetically-designed circuits. Therefore, in this case, an (a)synchronous analysis would mis-conclude that the network is not sufficient. We included a new paragraph (p4, lines 265-272) explaining this result.
In line 439 "MPBNs are formally guaranteed never to ignore behaviors hidden by artifacts of usual Boolean modeling". Is the intended meaning "formally guaranteed to capture behaviors that only multi-level discrete models could capture before"?
We thank the reviewer for this suggestion which we adopted.

What does "*" stand for in Equation [3]?
The notation "→*" referred to a (possibly empty) sequence of transitions. We modified [3] to remove this abbreviation.

What does P^{NP} mean in line 331?
This complexity class denotes problems that can be solved in polynomial time, assuming problems of class NP can be solved in one instruction (known as an oracle).P^{NP} is known to be harder than NP, but simpler than PSPACE, i.e., simpler than reachability and attractors with (a)synchronous BNs. We modified the relevant paragraph to improve the explanation of this complexity class (p5, lines 352-357).

In many instances the Supporting information has question marks instead of references.
We apologize for this mistake of compilation of the LaTeX document.

Reviewer #2
Comment In short, it seems to me that Fig. 1 and the main text pertaining to it (through the beginning of pg2) should be revised.
The I3-FFL network possesses a fixed logic, and we focussed on behaviors observed from a system where all nodes are initially inactive. Theoretical and experimental studies showed a transient activation of the output with this logic and initial state. The logic of the BN and its initial configuration reflect these settings and are correct with respect to the system: the theoretical studies used quantitative models, and the experimental study relied on synthetically-designed circuits, both reflecting the I3-FFL logic. Under some range of kinetic parameters and starting with all node inactive, they observed the transient activation of node 3 when increasing the signal.
The complete aspect was implicitly subject to starting from 000 . We updated the text and the legend of Fig. 1 to clarify these points (lines 43-57).
Moreover, we added an analysis of the I3-FFL network motif showing that this BN is the only one matching with the network motif and able to reproduce the behavior observed with quantitative models and synthetically-designed circuits. Note that in network suggested by the reviewer, node 2 is transformed into an inhibitor of 3 (instead of activator), making it a coherent FFL. We included a new paragraph (p4, lines 265-272) explaining this result.

Comment 3
On pg2 col1 the authors write, "Therefore, the validity of a model cannot be assessed by the usual interpretations of BNs." The meaning of "usual interpretations" here (and in the following paragraph) should be clarified.
By "usual" we meant by using the synchronous/asynchronous interpretations, and all their sub-cases. We rephrased with mentioning explicitly the (a)synchronous term.

Comment 4
On pg3 col2 the authors write, "with an erroneous conclusion that its underlying influence graph is wrong" when discussing the failure of a BN to capture observed dynamic behavior (Fig. 2). This does not seem correct to me; a BN is not fully defined by its influence graph ( Fig. 2a) but rather by its update rules (Fig. 2b). Similar to my comment 2, it seems to me here that one would first search for a different parameterization of the influence graph that succeeds in capturing the observed behavior.
Indeed, we made an unnecessary shortcut. We now replaced "underlying influence graph" by "logic" to avoid confusion. Note that all logics compatible with the influence graph may fail with (a)synchronous analysis, whereas one succeeds with MP. And it is the case with the I3-FFL network motif, where an (a)synchronous analysis would mis-conclude that the network is not sufficient. As already mentioned above, we included a new paragraph (p4, lines 265-272) explaining this result. Either 0 or decreasing states can change to increasing , and either 1 or increase states can change to decreasing. It is however impossible to go directly from increasing to 0 without going through the decreasing state, and from decreasing to 1 without going through the increasing state. We added this precision to the paragraph (p4, lines 243-254). Fig 3 recapitulates the possible transitions between the 4 states. Therefore, a fluctuating system can be simulated in MPBNs both by a steady state at 1 (i.e., the fluctuation stay above all the interactions thresholds) and by oscillations 1 -> decreasing -> increasing -> 1 without having to pass through the 0 state, but still matching the slopes.

Comment 6
On pg5 col1 the authors introduce some notation (e.g. P^{NP}), which should be defined for readers who are unfamiliar with it.
This complexity class denotes problems that can be solved in polynomial time, assuming problems of class NP can be solved in one instruction (known as a oracle). P^{NP} is known to be harder than NP, but simpler than PSPACE, i.e., simpler than reachability and attractors with (a)synchronous BNs. We modified the relevant paragraph to improve the explanation of this complexity class (p5, lines 352-357).

79
However, we found that this issue is actually due to the where B = {0, 1} represent the Boolean values. Each element 113 of a binary vector models the state (inactive/active, absent/p-114 resent) of the associated network node, and fi is the function 115 which specifies the state towards which the i-th element evolves. 116 Fig. 2(b) gives an example of a BN modeling a switch system. 117 BNs semantics computes the possible temporal evolutions 118 of the component states using different methods. With syn-119 chronous executions of BNs (introduced by S. Kauffman (1)), 120 we update all the components of the network at the same 121 time, and a configuration x ∈ B n can only evolve to one con-122 figuration f (x). With fully asynchronous executions of BNs 123 (introduced by R. Thomas and usually referred to more simply 124 as asynchronous in the computational systems biology litera-125 ture), we update only one component at a given time, and a 126 configuration x ∈ B n can evolve to any configuration which 127 differs only by a single component i where fi(x) = xi. This 128 introduces potential non-determinism in the model trajectory 129 since there can be different executions of the same BN from a 130 given initial configuration. The (fully) asynchronous seman-131 tics is often described as more realistic for modeling biological 132 networks, accounting for different kinetics of interactions. where ∆(x, y) is the list of components which state differs 145 between x and y, i.e., ∆(x, y) = {i ∈ {1, . . . , n} | xi = yi}.

146
A configuration y ∈ B n is reachable from x ∈ B n if either 147 x = y, or there exists a sequence of transitions from x to y: Notice that if y / ∈ ρ f a (x), then it is impossible to evolve from x 150 to y according to any of the semantics defined above, including 151 the synchronous and (fully) asynchronous ones.  Reachability is a fundamental property to assess the com-156 patibility of BN models with time series data: if none of the 157 configurations matching an observation at a given time is 158 reachable from any configuration matching an experimental 159 observation at an earlier time, the BN cannot predict :::::: capture 160 the observed behaviors.   One could use any of these frameworks to model the same In other words, does F specify a system with more quantitative 193 information than f , but follows the same (Boolean) logic for 194 the interactions. 195 We consider here a simple mathematical criterion for refine- In Fig. 2 we present a simple example of BN for which asyn-213 chronous executions miss possible behaviors of the network 214 when considering a multivalued refinement of it. The MN in 215 Fig. 2(d) is a refinement of the BN in Fig. 2(b). In addition 216 to the higher granularity for the activity levels of all three 217 components, it brings additional information on the activation 218 of component 3. An intermediate value of 2 is sufficient to 219 activate 3 provided that the value of its inhibitor 1 is not high. 220 One of its asynchronous execution shown in Fig. 2(e) predicts 221 that the three components can get activated simultaneously, 222 which was never predicted by any of the asynchronous execu-223 tions of the BN. Assuming the validation of the model were 224 subject to the reachability of a configuration with all the three 225 components active from a configuration with all the compo-226 nents inactive, this BN model would be deemed insufficient for 227 achieving the observed behavior, with an erroneous conclusion 228 that its underlying influence graph ::: logic : is wrong.   Fig. 2(b) starting from the configuration where all genes are inactive. Note that it correctly recovers the (transient) reachability of the configuration where the three components are active.

281
Formal guarantees for model refinements. Using the simple 282 examples in Fig. 1 and Fig. 2, we have shown that BN refine-283 ments can introduce behaviors that cannot be captured with 284 classical semantics.

366
::: and ::: are ::::::: subject :: to ::: the :::::::: following :::::::::: properties: ::::::::: NP ⊆ P NP Moreover, finding this shortcut requires exploring at most 382 a quadratic number of transitions in the general case, and 383 only 3n whenever the target configuration is in an attractor. 384 The exploration consists of performing as many transitions as 385 possible of the first phase, putting the largest possible number 386 of components in a dynamic state. For each component whose 387 state does not change between the starting and target config-388 uration, it is then necessary to switch the dynamic state back 389 (second phase). If this is not possible, then the exploration 390 is repeated from the beginning while preventing this specific 391 component from changing to a dynamic state (as it would still 392 be impossible to go back to the initial binary state, and the 393 target configuration would not be part of an attractor). Over-394 all, the exploration is thus repeated at most n times. Finally, 395 all the transitions of the third phase are applied, which should 396 lead to the target configuration if and only if it is reachable.

397
On the other hand, determining the possibility of a most-398 permissive transition is NP-complete in the general case: in-399 deed, the condition "∃z ∈ γ(x) : fi(z) = 1" in Fig. 3 is the 400 SAT problem. For biological networks, it is usual to assume 401 that components cannot have both positive (activator) and 402 negative (inhibitor) direct influences. The resulting BNs are 403 called locally monotonic: each local function fi is monotonic 404 for every component it depends on: increasing the number 405 of activators (resp. inhibitors) of i in state 1 can only in-406 crease (resp. decrease) the value of fi. Thus, determining the 407 existence of a Boolean interpretation z of a most-permissive 408 configuration x so that fi(z) = 1 comes down to considering 409 activators in dynamic state as 1 and inhibitors in dynamic 410 state as 0, and conversely for fi(z) = 0. Therefore, determin-411 ing the possibility of a most-permissive transition can be done 412 in linear time with locally-monotonic BNs.

413
The reachability problem in MPBNs can thus be solved 414 in polynomial time whenever f is locally monotonic (Theo-415 rem 3 in SI.2), a considerable drop in complexity compared 416 to synchronous or asynchronous BNs where the problem is 417 PSPACE-complete. With non-locally monotonic BNs, the 418 reachability problem is in P NP .

419
While the attractors of asynchronous BNs can be complex 420 objects, the attractors of MPBNs are particular mathematical 421 objects called minimal trap spaces. A trap space is a hypercube 422 which is closed by f : for any vertex x, f (x) is also a vertex. A 423 trap space is minimal whenever it does not include a different 424 trap space. Attractors in MPBNs have this regular structure 425 because whenever two configurations lying on any diagonal of 426 an hypercube are reachable from each other, they can reach 427 the adjacent configurations as well.

535
The choice of the dynamical interpretation of BNs has drastic 536 effects on their predictions. Whereas the (fully) asynchronous 537 BN interpretation is often advised for practical applications, it 538 overlooks behaviors emerging from different timescales for the 539 interactions, leading to biases when selecting plausible models. 540 Such misses are due to artifacts of configurations updates. 541 On the contrary, MPBNs offer a framework for reasoning on 542 the qualitative dynamics without making any strong a priori 543 hypothesis about the timescale and thresholds of interactions, 544 and without additional parameter.

545
The state-space explosion triggered by the usual 546 :::::::::: synchronous :::: and ::::::::::: asynchronous : interpretations of BNs is an-547 other significant bottleneck for their application in systems 548 biology (3, 32). MPBNs offer drastic gains in computational 549 complexity when analyzing possible trajectories and attrac-550 tors, both elementary and essential properties, underpining 551 the potential of a model. In practice, the verification of these 552 properties with asynchronous BNs is typically limited to net-553 works with 50 to 100 nodes. On the contrary, deciding the 554 reachability and attractor properties in MPBNs relies on scal-555 able algorithms and does not suffer from the state-space ex-556 plosion. For the case of locally-monotonic BNs, which is a 557 classical hypothesis for biological networks, the complexity 558 allows addressing very large scale networks, as illustrated in 559 SI.3, with experiments on BNs with up to 100,000 components. 560 Our software tool mpbn is available at and integrated in the 561 CoLoMoTo notebook environment (33).

D R A F T
The prediction of attractors reachable from specific initial 563 conditions, and possibly under various mutant conditions, is at 564 the core of many studies using logical models. While MPBNs 565 can identify the complete set of reachable attractors several 566 orders of magnitude faster than asynchronous BNs, the quan-567 tification of the propensities of each attractor, e.g., performed 568 by sampling the trajectories (23,30), is yet to be explored. In learning of large-scale logical models from experimental data.